-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider multi-message batching #386
Comments
Hashing many very short (< 128 byte) messages is a very important use-case for Merkle-trees and Proof-of-Work, both are major compute bottlenecks in transparent SNARKs (e.g. FRI/Ligero/Basefold like). It looks like the docs-hidden api |
@recmo Please note the So, My BLAKE3 rewrite linked above is more flexible:
Happy to dust it off and get it up to speed if needed. |
cc @zookozcash is there any renewed interest in supporting batched hashing of messages with different sizes? |
This is also a big use-case for me :) |
Problem
The current blake3 crate leaves a lot of single core performance on the table for message sizes below 8 KiB.
Namely, it doesn't SIMD parallelize hashing for small messages.
As a PoC, I've rewritten a BLAKE3 scheduler from scratch with a modified AVX2 backend:
https://github.com/firedancer-io/firedancer/tree/ripatel/fd_blake3/src/ballet/blake3
When hashing many independent 2 KiB messages concurrently, my implementation does 25 Gbps, while the C implementation does ~7 Gbps.
I would like to contribute back my changes to this official library.
My code is Apache-2.0 licensed, so feel free to copy from it.
Suggested Changes
There are three major pieces required:
log2(chunk_cnt) * simd_degree * 32
working space per hash state. The algorithm I came up with is unfortunately much more complex than the elegant stack-based one in the paper.fn blake3_multi(messages: &[&[u8]]) -> Vec<[u8; 32]>
The text was updated successfully, but these errors were encountered: