-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICH: Use 128-bit Blake2b hash instead of 64-bit SipHash for incr. comp. fingerprints #37233
Conversation
744c7f1
to
d07523c
Compare
I agree on removing the sha256 implementation. Not sure what we should do for |
The code looks good to me. I'm a bit confused by this sentence:
What does using "two SipHashes" mean? |
Have two SipHash states initialized with different seeds so they are (hopefully) statistically independent. That way you get two different 64 bit hashes. Whether that would actually be as good as a 128 bit hash, I'm not sure. |
OK. This takes as a given that 64 bits is insuficient (which is fine). I'm curious to know how this compares to one 64-bit hash as today, just for completeness. I guess that would be roughly a 2x slowdown? (Hashing overall is not that expensive, though, iirc.) |
Here are some numbers for the 64 bit case:
|
@michaelwoerister I'd say that's a fairly convincing argument that this is low overhead =) |
@bors r+ |
📌 Commit d07523c has been approved by |
…nikomatsakis ICH: Use 128-bit Blake2b hash instead of 64-bit SipHash for incr. comp. fingerprints This PR makes incr. comp. hashes 128 bits wide in order to push collision probability below a threshold that we need to worry about. It also replaces SipHash, which has been mentioned multiple times as not being built for fingerprinting, with the [BLAKE2b hash function](https://blake2.net/), an improved version of the BLAKE sha-3 finalist. I was worried that using a cryptographic hash function would make ICH computation noticeably slower, but after doing some performance tests, I'm not any more. Most of the time BLAKE2b is actually faster than using two SipHashes (in order to get 128 bits): ``` SipHash libcore: 0.199 seconds libstd: 0.090 seconds BLAKE2b libcore: 0.162 seconds libstd: 0.078 seconds ``` If someone can prove that something like MetroHash128 provides a comparably low collision probability as BLAKE2, I'm happy to switch. But for now we are at least not taking a performance hit. I also suggest that we throw out the sha-256 implementation in the compiler and replace it with BLAKE2, since our sha-256 implementation is two to three times slower than the BLAKE2 implementation in this PR (cc @alexcrichton @eddyb @brson) r? @nikomatsakis (although there's not much incr. comp. specific in here, so feel free to re-assign)
…nikomatsakis ICH: Use 128-bit Blake2b hash instead of 64-bit SipHash for incr. comp. fingerprints This PR makes incr. comp. hashes 128 bits wide in order to push collision probability below a threshold that we need to worry about. It also replaces SipHash, which has been mentioned multiple times as not being built for fingerprinting, with the [BLAKE2b hash function](https://blake2.net/), an improved version of the BLAKE sha-3 finalist. I was worried that using a cryptographic hash function would make ICH computation noticeably slower, but after doing some performance tests, I'm not any more. Most of the time BLAKE2b is actually faster than using two SipHashes (in order to get 128 bits): ``` SipHash libcore: 0.199 seconds libstd: 0.090 seconds BLAKE2b libcore: 0.162 seconds libstd: 0.078 seconds ``` If someone can prove that something like MetroHash128 provides a comparably low collision probability as BLAKE2, I'm happy to switch. But for now we are at least not taking a performance hit. I also suggest that we throw out the sha-256 implementation in the compiler and replace it with BLAKE2, since our sha-256 implementation is two to three times slower than the BLAKE2 implementation in this PR (cc @alexcrichton @eddyb @brson) r? @nikomatsakis (although there's not much incr. comp. specific in here, so feel free to re-assign)
@michaelwoerister I'm fine with replacing sha with blake. |
Why Blake2b and not SHA3? |
@tcosprojects Because Blake2b is faster to compute if these measurements are to be believed: (from https://blake2.net) |
This PR makes incr. comp. hashes 128 bits wide in order to push collision probability below a threshold that we need to worry about. It also replaces SipHash, which has been mentioned multiple times as not being built for fingerprinting, with the BLAKE2b hash function, an improved version of the BLAKE sha-3 finalist.
I was worried that using a cryptographic hash function would make ICH computation noticeably slower, but after doing some performance tests, I'm not any more. Most of the time BLAKE2b is actually faster than using two SipHashes (in order to get 128 bits):
If someone can prove that something like MetroHash128 provides a comparably low collision probability as BLAKE2, I'm happy to switch. But for now we are at least not taking a performance hit.
I also suggest that we throw out the sha-256 implementation in the compiler and replace it with BLAKE2, since our sha-256 implementation is two to three times slower than the BLAKE2 implementation in this PR (cc @alexcrichton @eddyb @brson)
r? @nikomatsakis (although there's not much incr. comp. specific in here, so feel free to re-assign)