Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
internal/prefix: use wide integer loads when reading
Rather than reading bits into the bit-buffer on a byte-by-byte basis, use uint64 loads to get a wide integer in a single operation. This optimization is only done on the Reader. The equivalent change is still to come for the Writer. On the compress/flate benchmarks: benchmark old MB/s new MB/s speedup BenchmarkDecode/Digits/Speed/1e4-4 102.85 112.07 1.09x BenchmarkDecode/Digits/Speed/1e5-4 97.34 106.39 1.09x BenchmarkDecode/Digits/Speed/1e6-4 100.97 110.31 1.09x BenchmarkDecode/Digits/Default/1e4-4 100.51 108.91 1.08x BenchmarkDecode/Digits/Default/1e5-4 97.16 105.93 1.09x BenchmarkDecode/Digits/Default/1e6-4 97.60 105.35 1.08x BenchmarkDecode/Digits/Compression/1e4-4 100.38 109.33 1.09x BenchmarkDecode/Digits/Compression/1e5-4 97.86 105.72 1.08x BenchmarkDecode/Digits/Compression/1e6-4 97.04 105.08 1.08x BenchmarkDecode/Huffman/Speed/1e4-4 98.97 112.76 1.14x BenchmarkDecode/Huffman/Speed/1e5-4 109.61 127.67 1.16x BenchmarkDecode/Huffman/Speed/1e6-4 110.23 128.10 1.16x BenchmarkDecode/Huffman/Default/1e4-4 99.20 110.47 1.11x BenchmarkDecode/Huffman/Default/1e5-4 102.74 117.63 1.14x BenchmarkDecode/Huffman/Default/1e6-4 104.49 120.32 1.15x BenchmarkDecode/Huffman/Compression/1e4-4 98.83 113.05 1.14x BenchmarkDecode/Huffman/Compression/1e5-4 102.10 117.41 1.15x BenchmarkDecode/Huffman/Compression/1e6-4 104.28 120.21 1.15x
- Loading branch information
Reverse8
is now available. I haven't checked if it outputperforms a LUT, but worth checking out