You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wrote two versions: a single-threaded version in C and a multi-threaded version in C++. The single-threaded version is around 25% faster than the official Rust b3sum on my system and is slightly faster than cat to /dev/null on my system, and is also slightly faster than fio on my system. The single-threaded version is able to hash a 10GiB file in 2.899s, which works out to around 3533MiB/s, which is roughly the same as the read speed advertised for my NVME drive ("3500MB/s"). The multi-threaded implementation is around 1% slower than my single-threaded implementation.
Benchmarks
For these tests, I used the same 1 GiB (or 10 GiB) input
file and always flushed the page cache before each test, thus ensuring
that the programs are always reading from disk. Each command was run 10
times and I used the "real" result from time to calculate
the statistics. I ran these commands on a Debian 12 system (uname -r
returns "6.1.0-9-amd64") using ext4 without disk encryption and without
LVM.
In the table above, liburing_b3sum_singlethread and liburing_b3sum_multithread are my own io_uring-based implementations of b3sum (more details below), and I verified that my b3sum implementations always produced the same BLAKE3 hash output as the official b3sum implementation. The 1GB.txt file was generated using this command:
dd if=/dev/urandom of=1GB.txt bs=1G count=1
I installed b3sum using this command:
cargo install b3sum
$ b3sum --version b3sum 1.4.1
I downloaded the b3sum_linux program from the BLAKE3 Github Releases page (it was the latest Linux binary):
$ ./b3sum_linux --version b3sum 1.4.1
I compiled the example program from the
example.c file in the BLAKE3 C repository as per the instructions in the
BLAKE3 C repository:
apt install xxhash
$ xxhsum --version
xxhsum 0.8.1 by Yann Collet
compiled as 64-bit x86_64 autoVec little endian with GCC 11.2.0`
Note
Note that, as the table above shows, the single-threaded version needs O_DIRECT in order to be fast (the flag that controls whether or not to use O_DIRECT is the third number after the filename in the command line arguments). The multi-threaded version is fast even without O_DIRECT (as the table shows, the multi-threaded version will hash a 1GiB file in 0.304s with O_DIRECT and 0.305s without O_DIRECT). For more details, see the article.md in the repository, or you can view the same article here (somewhat nicer formatting than Github) or here or here
I should also mention that my implementation does sequential reads from disk and uses the BLAKE3 C library so isn't capable of hashing on multiple cores.
I would very much appreciate any feedback!
The text was updated successfully, but these errors were encountered:
Hi all! I wrote an io_uring-based implementation of b3sum here: https://github.com/1f604/liburing_b3sum
I wrote two versions: a single-threaded version in C and a multi-threaded version in C++. The single-threaded version is around 25% faster than the official Rust b3sum on my system and is slightly faster than cat to /dev/null on my system, and is also slightly faster than fio on my system. The single-threaded version is able to hash a 10GiB file in 2.899s, which works out to around 3533MiB/s, which is roughly the same as the read speed advertised for my NVME drive ("3500MB/s"). The multi-threaded implementation is around 1% slower than my single-threaded implementation.
Benchmarks
For these tests, I used the same 1 GiB (or 10 GiB) input file and always flushed the page cache before each test, thus ensuring that the programs are always reading from disk. Each command was run 10 times and I used the "real" result from
time
to calculate the statistics. I ran these commands on a Debian 12 system (uname -r returns "6.1.0-9-amd64") using ext4 without disk encryption and without LVM.echo 1 > /proc/sys/vm/drop_caches; sleep 1; time b3sum 1GB.txt --num-threads 1
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time b3sum 1GB.txt --num-threads 2
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time b3sum 1GB.txt --num-threads 3
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time b3sum 1GB.txt --num-threads 4
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time b3sum 1GB.txt --num-threads 5
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time b3sum 1GB.txt --num-threads 6
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time b3sum 1GB.txt --num-threads 7
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time b3sum 1GB.txt --num-threads 8
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time b3sum 1GB.txt --no-mmap
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time ./b3sum_linux 1GB.txt --no-mmap
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time cat 1GB.txt > /dev/null
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time dd if=1GB.txt bs=64K of=/dev/null
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time dd if=1GB.txt bs=2M of=/dev/null
fio --name TEST --eta-newline=5s --filename=temp.file --rw=read --size=2g --io_size=1g --blocksize=512k --ioengine=io_uring --fsync=10000 --iodepth=2 --direct=1 --numjobs=1 --runtime=60 --group_reporting
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time ./liburing_b3sum_singlethread 1GB.txt 512 2 1 0 2 0 0
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time ./liburing_b3sum_multithread 1GB.txt 512 2 1 0 2 0 0
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time ./liburing_b3sum_singlethread 1GB.txt 128 20 0 0 8 0 0
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time ./liburing_b3sum_multithread 1GB.txt 128 20 0 0 8 0 0
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time xxhsum 1GB.txt
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time cat 10GB.txt > /dev/null
echo 1 > /proc/sys/vm/drop_caches; sleep 1; time ./liburing_b3sum_singlethread 10GB.txt 512 4 1 0 4 0 0
In the table above, liburing_b3sum_singlethread and liburing_b3sum_multithread are my own io_uring-based implementations of b3sum (more details below), and I verified that my b3sum implementations always produced the same BLAKE3 hash output as the official b3sum implementation. The 1GB.txt file was generated using this command:
I installed b3sum using this command:
I downloaded the b3sum_linux program from the BLAKE3 Github Releases page (it was the latest Linux binary):
I compiled the example program from the example.c file in the BLAKE3 C repository as per the instructions in the BLAKE3 C repository:
I installed xxhsum using this command:
Note
Note that, as the table above shows, the single-threaded version needs O_DIRECT in order to be fast (the flag that controls whether or not to use O_DIRECT is the third number after the filename in the command line arguments). The multi-threaded version is fast even without O_DIRECT (as the table shows, the multi-threaded version will hash a 1GiB file in 0.304s with O_DIRECT and 0.305s without O_DIRECT). For more details, see the
article.md
in the repository, or you can view the same article here (somewhat nicer formatting than Github) or here or hereI should also mention that my implementation does sequential reads from disk and uses the BLAKE3 C library so isn't capable of hashing on multiple cores.
I would very much appreciate any feedback!
The text was updated successfully, but these errors were encountered: