Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sync): MerkleStage #994

Merged
merged 66 commits into from
Feb 4, 2023
Merged

feat(sync): MerkleStage #994

merged 66 commits into from
Feb 4, 2023

Conversation

MegaRedHand
Copy link
Contributor

Closes: #817

This PR adds a stage for calculating the chain's state root in an incremental fashion. Currently it uses Parity's trie implementation with custom codec and database.

MegaRedHand and others added 9 commits January 23, 2023 17:12
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Jan 23, 2023

Codecov Report

Merging #994 (82dc1f3) into main (e0dbcae) will increase coverage by 0.56%.
The diff coverage is 93.17%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main     #994      +/-   ##
==========================================
+ Coverage   74.63%   75.20%   +0.56%     
==========================================
  Files         321      331      +10     
  Lines       34954    36015    +1061     
==========================================
+ Hits        26088    27085     +997     
- Misses       8866     8930      +64     
Flag Coverage Δ
unit-tests 75.20% <93.17%> (+0.56%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
crates/interfaces/src/consensus.rs 100.00% <ø> (ø)
crates/primitives/src/lib.rs 100.00% <ø> (ø)
crates/primitives/src/proofs.rs 96.75% <ø> (ø)
crates/stages/src/lib.rs 100.00% <ø> (ø)
crates/stages/src/sets.rs 0.00% <0.00%> (ø)
crates/storage/db/src/abstraction/cursor.rs 98.24% <ø> (ø)
crates/storage/db/src/tables/codecs/compact.rs 86.95% <ø> (ø)
crates/storage/db/src/tables/mod.rs 0.00% <ø> (ø)
crates/stages/src/stages/merkle.rs 88.02% <89.28%> (+88.02%) ⬆️
crates/stages/src/trie/mod.rs 95.87% <95.87%> (ø)
... and 71 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Co-authored-by: lambdaclass-user <[email protected]>
crates/stages/src/stages/merkle.rs Show resolved Hide resolved
let mut db = MemoryDB::<KeccakHasher, HashKey<KeccakHasher>, Vec<u8>>::from_null_node(
RLPNodeCodec::<KeccakHasher>::empty_node(),
RLPNodeCodec::<KeccakHasher>::empty_node().to_vec(),
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to save everything in RAM as it will not be usable for latest state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This was only temporary until I finished adding the bindings between the trie and our database.

}

#[derive(Debug, Default, Clone)]
struct RLPNodeCodec<H: Hasher>(PhantomData<H>);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have a comment here, what is an expectation for this, is it only for encoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

@onbjerg onbjerg added C-enhancement New feature or request A-staged-sync Related to staged sync (pipelines and stages) labels Jan 24, 2023
MegaRedHand and others added 16 commits January 24, 2023 10:11
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
MegaRedHand and others added 2 commits January 31, 2023 13:07
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Copy link
Member

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good so far - some comments

}
MerkleStage::Execution { clean_threshold } => *clean_threshold,
#[cfg(test)]
MerkleStage::Both { clean_threshold } => *clean_threshold,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both seems to be doing the same as Execution, do we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test macros we use assume that the same stage is capable of both executing and unwinding. Both is used to allow this behaviour in tests. Although it could be removed with some modifications to the stage test suite.

@@ -35,33 +37,81 @@ pub const MERKLE_UNWIND: StageId = StageId("MerkleUnwind");
#[derive(Debug)]
pub enum MerkleStage {
/// The execution portion of the hashing stage.
Execution,
Execution {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impl Default here with clean_threshold = 1k or whatever you choose

Suggested change
Execution {
Execution {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean something like this?

impl Default for MerkleStage {
    fn default() -> Self {
        Self::Execution { clean_threshold: 1000 }
    }
}

Wouldn't that be a bit confusing, since we also have the Unwind variant? I was thinking of having a default_<variant>() for each one, maybe that'd be more explicit. wdyt?

crates/stages/src/stages/merkle.rs Show resolved Hide resolved
Comment on lines 103 to 105
dbg!(loader
.update_root(tx, current_root, from_transition..to_transition)
.map_err(|e| StageError::Fatal(Box::new(e))))?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont forget to remove this

Comment on lines 116 to 117
info!(target: "sync::stages::merkle::exec", "Stage finished");
Ok(ExecOutput { stage_progress: input.previous_stage_progress(), done: true })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly confused here - is the expectation that this stage gets called once every block? Once every block range? How frequently are we verifying the state root?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is right now, the stage will build/update the trie and verify the current block's state root (i.e. the tip) only. Verifying each block's state root would be slower, but can be done.

if root == EMPTY_ROOT {
return Self::new(tx)
}
tx.get::<tables::AccountsTrie>(root)?.ok_or(TrieError::MissingRoot(root))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? We're instantiating with a new root but doing get?

Suggested change
tx.get::<tables::AccountsTrie>(root)?.ok_or(TrieError::MissingRoot(root))?;
tx.get::<tables::AccountsTrie>(root)?.ok_or(TrieError::MissingRoot(root))?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually not a new root but an existing one, and with the get we check that it exists. Maybe the name new_with_root is a bit confusing. wdyt about new_from_root?

Comment on lines +107 to +108
let mut cursor = self.tx.cursor_dup_read::<tables::StoragesTrie>()?;
Ok(cursor.seek_by_key_subkey(self.key, H256::from_slice(key))?.map(|entry| entry.node))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if instantiating a cursor each time here is going to be expensive..we shoudl bench

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Should I add a benchmark on this PR or open an issue for later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do after, when we get into performance benching

Comment on lines +246 to +247
// let mut walker = storage_cursor.walk_dup(address, H256::zero())?;
let mut current = storage_cursor.seek_by_key_subkey(address, H256::zero())?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between the 2? what's the failing assertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

walk_dup() instantiates a DupWalker that internally calls next_dup() on each next(), so both should be equivalent.
The assertion that fails is cASSERT(mc, root >= NUM_METAS); in crates/storage/libmdbx-rs/mdbx-sys/libmdbx/mdbx.c:18851, when calling DupWalker::next().

Comment on lines 414 to 435
let accounts = [
(
Address::from(hex!("9fe4abd71ad081f091bd06dd1c16f7e92927561e")),
Account { nonce: 155, balance: U256::from(414241124), bytecode_hash: None },
),
(
Address::from(hex!("f8a6edaad4a332e6e550d0915a7fd5300b0b12d1")),
Account { nonce: 3, balance: U256::from(78978), bytecode_hash: None },
),
];
for (address, account) in accounts {
tx.put::<tables::HashedAccount>(keccak256(address), account).unwrap();
}
let encoded_accounts = accounts.iter().map(|(k, v)| {
let mut out = Vec::new();
EthAccount::from(*v).encode(&mut out);
(k, out)
});
assert_eq!(
trie.calculate_root(&tx),
Ok(H256(sec_trie_root::<KeccakHasher, _, _, _>(encoded_accounts).0))
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This equivalence check would benefit a lot from being in a fuzzed test, for any edge cases to be caught. given we implement arbitrary for Account/Address, this should be easy to do with a proptest that yields a Vec<(Address, Account>) (e.g.

#[test]
fn arbitrary() {
proptest::proptest!(|(bytes: Bytes)| {
)

You're doing the same pattern everywhere:

  1. Generate N accounts
  2. Put them in the DB
  3. RLP encode them + hash them manually
  4. Check equality

same below for storage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added fuzzed tests for building the whole trie (multiple accounts with storage), that way we test the full functionality.

@@ -0,0 +1,581 @@
use crate::Transaction;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DB-backed cita_trie impl + loader look good, nice work on the tests. I'm worried about performance here slightly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the start we were using parity's trie-db, but we changed it because of root hash mismatching (see paritytech/trie#182), along with non-existent error handling in their database trait. The idea is to change the loader to use ours when it's finished (pending benchmarks, and optionally adding some way to more easily change implementations).

MegaRedHand and others added 7 commits February 1, 2023 10:40
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Copy link
Member

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good w/ me at a high level. defer to @rakita for final approval. thanks for taking this on, really valuable work.

Comment on lines +107 to +108
let mut cursor = self.tx.cursor_dup_read::<tables::StoragesTrie>()?;
Ok(cursor.seek_by_key_subkey(self.key, H256::from_slice(key))?.map(|entry| entry.node))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do after, when we get into performance benching

MegaRedHand and others added 4 commits February 2, 2023 10:54
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: lambdaclass-user <[email protected]>
@@ -313,6 +313,7 @@ mod ethereum_types_support {

fixed_revm_uint_impl!(RU128, 16);
fixed_revm_uint_impl!(RU256, 32);
impl_max_encoded_len!(RU256, { length_of_length(32) + 32 });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshieDo is this okay, I am not familiar with what it should do?

Copy link
Collaborator

@rakita rakita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, nice work!

MegaRedHand and others added 2 commits February 3, 2023 10:09
On merkle execute fail, the tree isn't commited to the database, and the unwind fails to find the root of the tree.

Co-authored-by: lambdaclass-user <[email protected]>
@gakonst gakonst merged commit fd7dc11 into paradigmxyz:main Feb 4, 2023
@gakonst
Copy link
Member

gakonst commented Feb 4, 2023

Thank you for taking this on @MegaRedHand! Let's see how performance is once we get there..

@MegaRedHand MegaRedHand deleted the merkle-stage branch February 4, 2023 03:11
literallymarvellous pushed a commit to literallymarvellous/reth that referenced this pull request Feb 6, 2023
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: Francisco Krause Arnim <[email protected]>
ensi321 pushed a commit to ensi321/reth that referenced this pull request Feb 7, 2023
Co-authored-by: lambdaclass-user <[email protected]>
Co-authored-by: Francisco Krause Arnim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-staged-sync Related to staged sync (pipelines and stages) C-enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MerkleStage
7 participants