Switch from 'cut' to 'head'; cut isn't doing what it should here #20

compilingEntropy · 2025-01-16T05:04:16Z

Based on the comment on line 66-67, the cut command is supposed to just grab the first 1024 bytes of the file and pass that to awk. Instead, the cut does... something else. I didn't bother figuring out what, but here's some tests that paint the picture:

$ time cut -b -1024 ./Downloads/World\ War\ Hulk.pdf | wc -c
 403378749
cut -b -1024 ./Downloads/World\ War\ Hulk.pdf  32.49s user 0.28s system 99% cpu 32.899 total
wc -c  3.28s user 0.19s system 10% cpu 32.898 total
$ time cat ./Downloads/World\ War\ Hulk.pdf | wc -c
 421636877
cat ./Downloads/World\ War\ Hulk.pdf  0.02s user 0.21s system 6% cpu 3.516 total
wc -c  3.44s user 0.07s system 99% cpu 3.515 total
$ time head -c 1024 ./Downloads/World\ War\ Hulk.pdf | wc -c
    1024
head -c 1024 ./Downloads/World\ War\ Hulk.pdf  0.00s user 0.00s system 71% cpu 0.004 total
wc -c  0.00s user 0.00s system 67% cpu 0.004 total

The cut command chops out some of the file, sure, but not the intended portion (>95% of the file remains) and for my (large) file, it takes 33 seconds to do so.
A simple cat in the second command is 10x faster than the cut, completing in 3.5 seconds and containing the entire file.
If you use head instead, you get the intended first 1024 bytes and the command completes effectively instantly.

This commit moves away from cut to the head command, which accomplishes the intended file slice and improves performance (particularly on large files).

Switch from 'cut' to 'head'; cut isn't doing what it should here

92a92f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from 'cut' to 'head'; cut isn't doing what it should here #20

Switch from 'cut' to 'head'; cut isn't doing what it should here #20

compilingEntropy commented Jan 16, 2025

Switch from 'cut' to 'head'; cut isn't doing what it should here #20

Are you sure you want to change the base?

Switch from 'cut' to 'head'; cut isn't doing what it should here #20

Conversation

compilingEntropy commented Jan 16, 2025