remove the redundant hasher in Bloom #6404

Hawstein · 2017-08-28T15:47:24Z

Fix the TODO stuff in Bloom.

parity-cla-bot · 2017-08-28T15:47:27Z

It looks like @Hawstein signed our Contributor License Agreement. 👍

Many thanks,

Parity Technologies CLA Bot

tomusdrw · 2017-08-29T08:28:34Z

util/bloom/src/lib.rs

-		if k_i < NUMBER_OF_HASHERS as u32 {
-			let mut sip = self.sips[k_i as usize].clone();
+		if k_i < 2 {
+			let mut sip = self.sip.clone();
 			item.hash(&mut sip);
 			let hash = sip.finish();
 			hashes[k_i as usize] = hash;


The thing is that the hashes array will contain exactly the same number twice, so the whole thing is redundant as well.

I think we should initialize two different SipHashers and keep the logic here, but that will break backward compatibility.
If we want to use only a single hasher then this code should be simplified as well, and I think we should review if that hashing method doesn't generate too many collisions.

Hi, could you explain in which way using different hashers will break backward compatibility, thank you.

Without the bc problem, using two different independent hashers is indeed the best choice according to Kirsch and Mitzenmacher's paper.

A bloom filter of non-empty accounts is stored as part of the state and loaded via Bloom::from_parts. If we change the bloom_hash method it will affect check and therefore the restored bitmap will have no correlation to accounts we are checking it against.
The change is possible but will require a database migration similar to: https://github.com/paritytech/parity/blob/2b81b982197d27efeea80c8a176c385114d95fdb/ethcore/src/migrations/v10.rs#L17

After speaking with some team members we decided that it would be best to leave the hashing logic as is (i.e. avoid migration) but simplify it as much as we can (perhaps gaining some performance).
So you can probably avoid calling item.hash(&mut sip) twice (just copy the value for the next iteration) and avoid storing the SipHasher alltogether (as it now suggests that the hasher state is actually maintained, which is not true because it's being cloned before use).

@tomusdrw Thanks for explaining it.

I just updated the PR according to your suggestion.

tomusdrw

Looks perfect now! @NikVolf do we have bloom hash tests that ensure that compatibility hasnot been broken?

NikVolf · 2017-08-30T14:28:13Z

I can't think of any such test besides constant comparison like this:

let bloom = Bloom::new(128, 16);
bloom.set(<anything in 0..128 range>, <anything T: Hash>);
assert_eq!(bloom.drain_journal().drain(), <constant from previous version>);

and It would be nice if it will be added before merge

* remove the redundant hasher in Bloom * add the test to check the hash backward compatibility

Hawstein · 2017-08-30T16:05:55Z

Sure, test added. @NikVolf @tomusdrw

NikVolf · 2017-08-30T16:25:56Z

@Hawstein

Great! Awesome set of samples also 👍

Hawstein · 2017-08-31T04:46:23Z

Follow up:

I guess we should add backward compatibility test for the Bloom::from_parts too since it's another way to create the Bloom instance. Adding test for it avoids people change the implementation in from_parts. And just as @tomusdrw said, non-empty accounts state will be loaded via Bloom::from_parts.

Sorry for not considering the comprehensive aspects and add it in this PR. If you guys consider it necessary, I can create another PR for it. @tomusdrw @NikVolf

tomusdrw · 2017-08-31T08:43:44Z

@Hawstein it would be good to have such test indeed. Feel free to submit another PR if you are game.

Hawstein · 2017-08-31T11:15:43Z

@tomusdrw PR created.

debris approved these changes Aug 28, 2017

View reviewed changes

debris added A8-looksgood 🦄 Pull request is reviewed well. M4-core ⛓ Core client code / Rust. and removed A8-looksgood 🦄 Pull request is reviewed well. labels Aug 28, 2017

tomusdrw suggested changes Aug 29, 2017

View reviewed changes

tomusdrw added the A5-grumble 🔥 Pull request has minor issues that must be addressed before merging. label Aug 29, 2017

Hawstein force-pushed the use-one-hasher branch from a13186c to c16f165 Compare August 29, 2017 16:56

tomusdrw approved these changes Aug 30, 2017

View reviewed changes

tomusdrw added A8-looksgood 🦄 Pull request is reviewed well. and removed A5-grumble 🔥 Pull request has minor issues that must be addressed before merging. labels Aug 30, 2017

NikVolf added A4-gotissues 💥 Pull request is reviewed and has significant issues which must be addressed. and removed A8-looksgood 🦄 Pull request is reviewed well. labels Aug 30, 2017

use one hasher in Bloom

539e579

* remove the redundant hasher in Bloom * add the test to check the hash backward compatibility

Hawstein force-pushed the use-one-hasher branch from c16f165 to 539e579 Compare August 30, 2017 16:04

NikVolf added A8-looksgood 🦄 Pull request is reviewed well. and removed A4-gotissues 💥 Pull request is reviewed and has significant issues which must be addressed. labels Aug 30, 2017

tomusdrw merged commit e04d58f into openethereum:master Aug 30, 2017

Hawstein deleted the use-one-hasher branch August 30, 2017 16:41

Hawstein mentioned this pull request Aug 31, 2017

add more hash backward compatibility test for bloom #6425

Merged

tomusdrw mentioned this pull request Nov 20, 2018

Bloom Filter: wrong number of hash functions used #9843

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove the redundant hasher in Bloom #6404

remove the redundant hasher in Bloom #6404

Hawstein commented Aug 28, 2017

parity-cla-bot commented Aug 28, 2017

tomusdrw Aug 29, 2017

Hawstein Aug 29, 2017

tomusdrw Aug 29, 2017

tomusdrw Aug 29, 2017

Hawstein Aug 29, 2017

tomusdrw left a comment

NikVolf commented Aug 30, 2017 •

edited

Loading

Hawstein commented Aug 30, 2017

NikVolf commented Aug 30, 2017

Hawstein commented Aug 31, 2017

tomusdrw commented Aug 31, 2017

Hawstein commented Aug 31, 2017

remove the redundant hasher in Bloom #6404

remove the redundant hasher in Bloom #6404

Conversation

Hawstein commented Aug 28, 2017

parity-cla-bot commented Aug 28, 2017

tomusdrw Aug 29, 2017

Choose a reason for hiding this comment

Hawstein Aug 29, 2017

Choose a reason for hiding this comment

tomusdrw Aug 29, 2017

Choose a reason for hiding this comment

tomusdrw Aug 29, 2017

Choose a reason for hiding this comment

Hawstein Aug 29, 2017

Choose a reason for hiding this comment

tomusdrw left a comment

Choose a reason for hiding this comment

NikVolf commented Aug 30, 2017 • edited Loading

Hawstein commented Aug 30, 2017

NikVolf commented Aug 30, 2017

Hawstein commented Aug 31, 2017

tomusdrw commented Aug 31, 2017

Hawstein commented Aug 31, 2017

NikVolf commented Aug 30, 2017 •

edited

Loading