-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sync): MerkleStage
#994
Conversation
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #994 +/- ##
==========================================
+ Coverage 74.63% 75.20% +0.56%
==========================================
Files 321 331 +10
Lines 34954 36015 +1061
==========================================
+ Hits 26088 27085 +997
- Misses 8866 8930 +64
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Co-authored-by: lambdaclass-user <[email protected]>
crates/stages/src/trie/mod.rs
Outdated
let mut db = MemoryDB::<KeccakHasher, HashKey<KeccakHasher>, Vec<u8>>::from_null_node( | ||
RLPNodeCodec::<KeccakHasher>::empty_node(), | ||
RLPNodeCodec::<KeccakHasher>::empty_node().to_vec(), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to save everything in RAM as it will not be usable for latest state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This was only temporary until I finished adding the bindings between the trie and our database.
crates/stages/src/trie/mod.rs
Outdated
} | ||
|
||
#[derive(Debug, Default, Clone)] | ||
struct RLPNodeCodec<H: Hasher>(PhantomData<H>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have a comment here, what is an expectation for this, is it only for encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added!
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good so far - some comments
} | ||
MerkleStage::Execution { clean_threshold } => *clean_threshold, | ||
#[cfg(test)] | ||
MerkleStage::Both { clean_threshold } => *clean_threshold, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both
seems to be doing the same as Execution
, do we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test macros we use assume that the same stage is capable of both executing and unwinding. Both
is used to allow this behaviour in tests. Although it could be removed with some modifications to the stage test suite.
@@ -35,33 +37,81 @@ pub const MERKLE_UNWIND: StageId = StageId("MerkleUnwind"); | |||
#[derive(Debug)] | |||
pub enum MerkleStage { | |||
/// The execution portion of the hashing stage. | |||
Execution, | |||
Execution { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impl Default here with clean_threshold = 1k or whatever you choose
Execution { | |
Execution { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean something like this?
impl Default for MerkleStage {
fn default() -> Self {
Self::Execution { clean_threshold: 1000 }
}
}
Wouldn't that be a bit confusing, since we also have the Unwind
variant? I was thinking of having a default_<variant>()
for each one, maybe that'd be more explicit. wdyt?
crates/stages/src/stages/merkle.rs
Outdated
dbg!(loader | ||
.update_root(tx, current_root, from_transition..to_transition) | ||
.map_err(|e| StageError::Fatal(Box::new(e))))? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont forget to remove this
info!(target: "sync::stages::merkle::exec", "Stage finished"); | ||
Ok(ExecOutput { stage_progress: input.previous_stage_progress(), done: true }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly confused here - is the expectation that this stage gets called once every block? Once every block range? How frequently are we verifying the state root?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is right now, the stage will build/update the trie and verify the current block's state root (i.e. the tip) only. Verifying each block's state root would be slower, but can be done.
if root == EMPTY_ROOT { | ||
return Self::new(tx) | ||
} | ||
tx.get::<tables::AccountsTrie>(root)?.ok_or(TrieError::MissingRoot(root))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work? We're instantiating with a new root but doing get
?
tx.get::<tables::AccountsTrie>(root)?.ok_or(TrieError::MissingRoot(root))?; | |
tx.get::<tables::AccountsTrie>(root)?.ok_or(TrieError::MissingRoot(root))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually not a new root but an existing one, and with the get
we check that it exists. Maybe the name new_with_root
is a bit confusing. wdyt about new_from_root
?
let mut cursor = self.tx.cursor_dup_read::<tables::StoragesTrie>()?; | ||
Ok(cursor.seek_by_key_subkey(self.key, H256::from_slice(key))?.map(|entry| entry.node)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if instantiating a cursor each time here is going to be expensive..we shoudl bench
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Should I add a benchmark on this PR or open an issue for later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do after, when we get into performance benching
// let mut walker = storage_cursor.walk_dup(address, H256::zero())?; | ||
let mut current = storage_cursor.seek_by_key_subkey(address, H256::zero())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between the 2? what's the failing assertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
walk_dup()
instantiates a DupWalker
that internally calls next_dup()
on each next()
, so both should be equivalent.
The assertion that fails is cASSERT(mc, root >= NUM_METAS);
in crates/storage/libmdbx-rs/mdbx-sys/libmdbx/mdbx.c:18851
, when calling DupWalker::next()
.
crates/stages/src/trie/mod.rs
Outdated
let accounts = [ | ||
( | ||
Address::from(hex!("9fe4abd71ad081f091bd06dd1c16f7e92927561e")), | ||
Account { nonce: 155, balance: U256::from(414241124), bytecode_hash: None }, | ||
), | ||
( | ||
Address::from(hex!("f8a6edaad4a332e6e550d0915a7fd5300b0b12d1")), | ||
Account { nonce: 3, balance: U256::from(78978), bytecode_hash: None }, | ||
), | ||
]; | ||
for (address, account) in accounts { | ||
tx.put::<tables::HashedAccount>(keccak256(address), account).unwrap(); | ||
} | ||
let encoded_accounts = accounts.iter().map(|(k, v)| { | ||
let mut out = Vec::new(); | ||
EthAccount::from(*v).encode(&mut out); | ||
(k, out) | ||
}); | ||
assert_eq!( | ||
trie.calculate_root(&tx), | ||
Ok(H256(sec_trie_root::<KeccakHasher, _, _, _>(encoded_accounts).0)) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This equivalence check would benefit a lot from being in a fuzzed test, for any edge cases to be caught. given we implement arbitrary for Account/Address, this should be easy to do with a proptest that yields a Vec<(Address, Account>)
(e.g.
reth/crates/primitives/src/hex_bytes.rs
Lines 361 to 363 in 0149bde
#[test] | |
fn arbitrary() { | |
proptest::proptest!(|(bytes: Bytes)| { |
You're doing the same pattern everywhere:
- Generate N accounts
- Put them in the DB
- RLP encode them + hash them manually
- Check equality
same below for storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added fuzzed tests for building the whole trie (multiple accounts with storage), that way we test the full functionality.
@@ -0,0 +1,581 @@ | |||
use crate::Transaction; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DB-backed cita_trie impl + loader look good, nice work on the tests. I'm worried about performance here slightly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the start we were using parity's trie-db
, but we changed it because of root hash mismatching (see paritytech/trie#182), along with non-existent error handling in their database trait. The idea is to change the loader to use ours when it's finished (pending benchmarks, and optionally adding some way to more easily change implementations).
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good w/ me at a high level. defer to @rakita for final approval. thanks for taking this on, really valuable work.
let mut cursor = self.tx.cursor_dup_read::<tables::StoragesTrie>()?; | ||
Ok(cursor.seek_by_key_subkey(self.key, H256::from_slice(key))?.map(|entry| entry.node)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do after, when we get into performance benching
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
crates/common/rlp/src/encode.rs
Outdated
@@ -313,6 +313,7 @@ mod ethereum_types_support { | |||
|
|||
fixed_revm_uint_impl!(RU128, 16); | |||
fixed_revm_uint_impl!(RU256, 32); | |||
impl_max_encoded_len!(RU256, { length_of_length(32) + 32 }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshieDo is this okay, I am not familiar with what it should do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, nice work!
On merkle execute fail, the tree isn't commited to the database, and the unwind fails to find the root of the tree. Co-authored-by: lambdaclass-user <[email protected]>
481b60c
to
a240b6a
Compare
Thank you for taking this on @MegaRedHand! Let's see how performance is once we get there.. |
Co-authored-by: lambdaclass-user <[email protected]> Co-authored-by: Francisco Krause Arnim <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]> Co-authored-by: Francisco Krause Arnim <[email protected]>
Closes: #817
This PR adds a stage for calculating the chain's state root in an incremental fashion. Currently it uses Parity's trie implementation with custom codec and database.