-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Conversation
It looks like @Hawstein signed our Contributor License Agreement. 👍 Many thanks, Parity Technologies CLA Bot |
util/bloom/src/lib.rs
Outdated
if k_i < NUMBER_OF_HASHERS as u32 { | ||
let mut sip = self.sips[k_i as usize].clone(); | ||
if k_i < 2 { | ||
let mut sip = self.sip.clone(); | ||
item.hash(&mut sip); | ||
let hash = sip.finish(); | ||
hashes[k_i as usize] = hash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that the hashes
array will contain exactly the same number twice, so the whole thing is redundant as well.
I think we should initialize two different SipHashers
and keep the logic here, but that will break backward compatibility.
If we want to use only a single hasher then this code should be simplified as well, and I think we should review if that hashing method doesn't generate too many collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, could you explain in which way using different hashers will break backward compatibility, thank you.
Without the bc problem, using two different independent hashers is indeed the best choice according to Kirsch and Mitzenmacher's paper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bloom filter of non-empty accounts is stored as part of the state and loaded via Bloom::from_parts
. If we change the bloom_hash
method it will affect check
and therefore the restored bitmap
will have no correlation to accounts we are checking it against.
The change is possible but will require a database migration similar to: https://github.com/paritytech/parity/blob/2b81b982197d27efeea80c8a176c385114d95fdb/ethcore/src/migrations/v10.rs#L17
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After speaking with some team members we decided that it would be best to leave the hashing logic as is (i.e. avoid migration) but simplify it as much as we can (perhaps gaining some performance).
So you can probably avoid calling item.hash(&mut sip)
twice (just copy the value for the next iteration) and avoid storing the SipHasher
alltogether (as it now suggests that the hasher state is actually maintained, which is not true because it's being cloned before use).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomusdrw Thanks for explaining it.
I just updated the PR according to your suggestion.
a13186c
to
c16f165
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks perfect now! @NikVolf do we have bloom hash tests that ensure that compatibility hasnot been broken?
I can't think of any such test besides constant comparison like this: let bloom = Bloom::new(128, 16);
bloom.set(<anything in 0..128 range>, <anything T: Hash>);
assert_eq!(bloom.drain_journal().drain(), <constant from previous version>); and It would be nice if it will be added before merge |
* remove the redundant hasher in Bloom * add the test to check the hash backward compatibility
c16f165
to
539e579
Compare
Great! Awesome set of samples also 👍 |
Follow up: I guess we should add backward compatibility test for the Sorry for not considering the comprehensive aspects and add it in this PR. If you guys consider it necessary, I can create another PR for it. @tomusdrw @NikVolf |
@Hawstein it would be good to have such test indeed. Feel free to submit another PR if you are game. |
@tomusdrw PR created. |
Fix the TODO stuff in Bloom.