Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

remove the redundant hasher in Bloom #6404

Merged
merged 1 commit into from
Aug 30, 2017

Conversation

Hawstein
Copy link
Contributor

Fix the TODO stuff in Bloom.

@parity-cla-bot
Copy link

It looks like @Hawstein signed our Contributor License Agreement. 👍

Many thanks,

Parity Technologies CLA Bot

@debris debris added A8-looksgood 🦄 Pull request is reviewed well. M4-core ⛓ Core client code / Rust. and removed A8-looksgood 🦄 Pull request is reviewed well. labels Aug 28, 2017
if k_i < NUMBER_OF_HASHERS as u32 {
let mut sip = self.sips[k_i as usize].clone();
if k_i < 2 {
let mut sip = self.sip.clone();
item.hash(&mut sip);
let hash = sip.finish();
hashes[k_i as usize] = hash;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that the hashes array will contain exactly the same number twice, so the whole thing is redundant as well.

I think we should initialize two different SipHashers and keep the logic here, but that will break backward compatibility.
If we want to use only a single hasher then this code should be simplified as well, and I think we should review if that hashing method doesn't generate too many collisions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, could you explain in which way using different hashers will break backward compatibility, thank you.

Without the bc problem, using two different independent hashers is indeed the best choice according to Kirsch and Mitzenmacher's paper.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bloom filter of non-empty accounts is stored as part of the state and loaded via Bloom::from_parts. If we change the bloom_hash method it will affect check and therefore the restored bitmap will have no correlation to accounts we are checking it against.
The change is possible but will require a database migration similar to: https://github.com/paritytech/parity/blob/2b81b982197d27efeea80c8a176c385114d95fdb/ethcore/src/migrations/v10.rs#L17

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After speaking with some team members we decided that it would be best to leave the hashing logic as is (i.e. avoid migration) but simplify it as much as we can (perhaps gaining some performance).
So you can probably avoid calling item.hash(&mut sip) twice (just copy the value for the next iteration) and avoid storing the SipHasher alltogether (as it now suggests that the hasher state is actually maintained, which is not true because it's being cloned before use).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomusdrw Thanks for explaining it.

I just updated the PR according to your suggestion.

@tomusdrw tomusdrw added the A5-grumble 🔥 Pull request has minor issues that must be addressed before merging. label Aug 29, 2017
Copy link
Collaborator

@tomusdrw tomusdrw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect now! @NikVolf do we have bloom hash tests that ensure that compatibility hasnot been broken?

@tomusdrw tomusdrw added A8-looksgood 🦄 Pull request is reviewed well. and removed A5-grumble 🔥 Pull request has minor issues that must be addressed before merging. labels Aug 30, 2017
@NikVolf NikVolf added A4-gotissues 💥 Pull request is reviewed and has significant issues which must be addressed. and removed A8-looksgood 🦄 Pull request is reviewed well. labels Aug 30, 2017
@NikVolf
Copy link
Contributor

NikVolf commented Aug 30, 2017

I can't think of any such test besides constant comparison like this:

let bloom = Bloom::new(128, 16);
bloom.set(<anything in 0..128 range>, <anything T: Hash>);
assert_eq!(bloom.drain_journal().drain(), <constant from previous version>);

and It would be nice if it will be added before merge

* remove the redundant hasher in Bloom

* add the test to check the hash backward compatibility
@Hawstein
Copy link
Contributor Author

Sure, test added. @NikVolf @tomusdrw

@NikVolf
Copy link
Contributor

NikVolf commented Aug 30, 2017

@Hawstein

Great! Awesome set of samples also 👍

@NikVolf NikVolf added A8-looksgood 🦄 Pull request is reviewed well. and removed A4-gotissues 💥 Pull request is reviewed and has significant issues which must be addressed. labels Aug 30, 2017
@tomusdrw tomusdrw merged commit e04d58f into openethereum:master Aug 30, 2017
@Hawstein Hawstein deleted the use-one-hasher branch August 30, 2017 16:41
@Hawstein
Copy link
Contributor Author

Follow up:

I guess we should add backward compatibility test for the Bloom::from_parts too since it's another way to create the Bloom instance. Adding test for it avoids people change the implementation in from_parts. And just as @tomusdrw said, non-empty accounts state will be loaded via Bloom::from_parts.

Sorry for not considering the comprehensive aspects and add it in this PR. If you guys consider it necessary, I can create another PR for it. @tomusdrw @NikVolf

@tomusdrw
Copy link
Collaborator

@Hawstein it would be good to have such test indeed. Feel free to submit another PR if you are game.

@Hawstein
Copy link
Contributor Author

@tomusdrw PR created.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A8-looksgood 🦄 Pull request is reviewed well. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants