feat(sync): `MerkleStage` #994

MegaRedHand · 2023-01-23T20:54:08Z

Closes: #817

This PR adds a stage for calculating the chain's state root in an incremental fashion. Currently it uses Parity's trie implementation with custom codec and database.

Co-authored-by: lambdaclass-user <[email protected]>

codecov-commenter · 2023-01-23T21:14:33Z

Codecov Report

Merging #994 (82dc1f3) into main (e0dbcae) will increase coverage by 0.56%.
The diff coverage is 93.17%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main     #994      +/-   ##
==========================================
+ Coverage   74.63%   75.20%   +0.56%     
==========================================
  Files         321      331      +10     
  Lines       34954    36015    +1061     
==========================================
+ Hits        26088    27085     +997     
- Misses       8866     8930      +64

Flag	Coverage Δ
unit-tests	`75.20% <93.17%> (+0.56%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
crates/interfaces/src/consensus.rs	`100.00% <ø> (ø)`
crates/primitives/src/lib.rs	`100.00% <ø> (ø)`
crates/primitives/src/proofs.rs	`96.75% <ø> (ø)`
crates/stages/src/lib.rs	`100.00% <ø> (ø)`
crates/stages/src/sets.rs	`0.00% <0.00%> (ø)`
crates/storage/db/src/abstraction/cursor.rs	`98.24% <ø> (ø)`
crates/storage/db/src/tables/codecs/compact.rs	`86.95% <ø> (ø)`
crates/storage/db/src/tables/mod.rs	`0.00% <ø> (ø)`
crates/stages/src/stages/merkle.rs	`88.02% <89.28%> (+88.02%)`	⬆️
crates/stages/src/trie/mod.rs	`95.87% <95.87%> (ø)`
... and 71 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Co-authored-by: lambdaclass-user <[email protected]>

crates/stages/src/stages/merkle.rs

rakita · 2023-01-24T11:16:08Z

crates/stages/src/trie/mod.rs

+        let mut db = MemoryDB::<KeccakHasher, HashKey<KeccakHasher>, Vec<u8>>::from_null_node(
+            RLPNodeCodec::<KeccakHasher>::empty_node(),
+            RLPNodeCodec::<KeccakHasher>::empty_node().to_vec(),
+        );


Is this going to save everything in RAM as it will not be usable for latest state?

No. This was only temporary until I finished adding the bindings between the trie and our database.

rakita · 2023-01-24T11:22:40Z

crates/stages/src/trie/mod.rs

+}
+
+#[derive(Debug, Default, Clone)]
+struct RLPNodeCodec<H: Hasher>(PhantomData<H>);


Would be good to have a comment here, what is an expectation for this, is it only for encoding?

Co-authored-by: lambdaclass-user <[email protected]>

crates/stages/src/trie/mod.rs

Co-authored-by: lambdaclass-user <[email protected]>

crates/stages/src/stages/merkle.rs

Co-authored-by: lambdaclass-user <[email protected]>

gakonst

looking good so far - some comments

gakonst · 2023-01-31T23:37:11Z

crates/stages/src/stages/merkle.rs

+            }
+            MerkleStage::Execution { clean_threshold } => *clean_threshold,
+            #[cfg(test)]
+            MerkleStage::Both { clean_threshold } => *clean_threshold,


Both seems to be doing the same as Execution, do we need it?

The test macros we use assume that the same stage is capable of both executing and unwinding. Both is used to allow this behaviour in tests. Although it could be removed with some modifications to the stage test suite.

gakonst · 2023-01-31T23:40:08Z

crates/stages/src/stages/merkle.rs

@@ -35,33 +37,81 @@ pub const MERKLE_UNWIND: StageId = StageId("MerkleUnwind");
 #[derive(Debug)]
 pub enum MerkleStage {
    /// The execution portion of the hashing stage.
-    Execution,
+    Execution {


Impl Default here with clean_threshold = 1k or whatever you choose

Suggested change

Execution {

Execution {

You mean something like this?

impl Default for MerkleStage { fn default() -> Self { Self::Execution { clean_threshold: 1000 } } }

Wouldn't that be a bit confusing, since we also have the Unwind variant? I was thinking of having a default_<variant>() for each one, maybe that'd be more explicit. wdyt?

crates/stages/src/stages/merkle.rs

gakonst · 2023-01-31T23:45:39Z

crates/stages/src/stages/merkle.rs

+            dbg!(loader
+                .update_root(tx, current_root, from_transition..to_transition)
+                .map_err(|e| StageError::Fatal(Box::new(e))))?


dont forget to remove this

gakonst · 2023-01-31T23:50:20Z

crates/stages/src/stages/merkle.rs

        info!(target: "sync::stages::merkle::exec", "Stage finished");
        Ok(ExecOutput { stage_progress: input.previous_stage_progress(), done: true })


Slightly confused here - is the expectation that this stage gets called once every block? Once every block range? How frequently are we verifying the state root?

As it is right now, the stage will build/update the trie and verify the current block's state root (i.e. the tip) only. Verifying each block's state root would be slower, but can be done.

gakonst · 2023-02-01T00:55:22Z

crates/stages/src/trie/mod.rs

+        if root == EMPTY_ROOT {
+            return Self::new(tx)
+        }
+        tx.get::<tables::AccountsTrie>(root)?.ok_or(TrieError::MissingRoot(root))?;


How does this work? We're instantiating with a new root but doing get?

Suggested change

tx.get::<tables::AccountsTrie>(root)?.ok_or(TrieError::MissingRoot(root))?;

tx.get::<tables::AccountsTrie>(root)?.ok_or(TrieError::MissingRoot(root))?;

It's actually not a new root but an existing one, and with the get we check that it exists. Maybe the name new_with_root is a bit confusing. wdyt about new_from_root?

gakonst · 2023-02-01T00:55:58Z

crates/stages/src/trie/mod.rs

+        let mut cursor = self.tx.cursor_dup_read::<tables::StoragesTrie>()?;
+        Ok(cursor.seek_by_key_subkey(self.key, H256::from_slice(key))?.map(|entry| entry.node))


Wonder if instantiating a cursor each time here is going to be expensive..we shoudl bench

I agree. Should I add a benchmark on this PR or open an issue for later?

Let's do after, when we get into performance benching

gakonst · 2023-02-01T00:57:31Z

crates/stages/src/trie/mod.rs

+        // let mut walker = storage_cursor.walk_dup(address, H256::zero())?;
+        let mut current = storage_cursor.seek_by_key_subkey(address, H256::zero())?;


what's the difference between the 2? what's the failing assertion?

walk_dup() instantiates a DupWalker that internally calls next_dup() on each next(), so both should be equivalent.
The assertion that fails is cASSERT(mc, root >= NUM_METAS); in crates/storage/libmdbx-rs/mdbx-sys/libmdbx/mdbx.c:18851, when calling DupWalker::next().

gakonst · 2023-02-01T00:59:15Z

crates/stages/src/trie/mod.rs

+        let accounts = [
+            (
+                Address::from(hex!("9fe4abd71ad081f091bd06dd1c16f7e92927561e")),
+                Account { nonce: 155, balance: U256::from(414241124), bytecode_hash: None },
+            ),
+            (
+                Address::from(hex!("f8a6edaad4a332e6e550d0915a7fd5300b0b12d1")),
+                Account { nonce: 3, balance: U256::from(78978), bytecode_hash: None },
+            ),
+        ];
+        for (address, account) in accounts {
+            tx.put::<tables::HashedAccount>(keccak256(address), account).unwrap();
+        }
+        let encoded_accounts = accounts.iter().map(|(k, v)| {
+            let mut out = Vec::new();
+            EthAccount::from(*v).encode(&mut out);
+            (k, out)
+        });
+        assert_eq!(
+            trie.calculate_root(&tx),
+            Ok(H256(sec_trie_root::<KeccakHasher, _, _, _>(encoded_accounts).0))
+        );


This equivalence check would benefit a lot from being in a fuzzed test, for any edge cases to be caught. given we implement arbitrary for Account/Address, this should be easy to do with a proptest that yields a Vec<(Address, Account>) (e.g.

reth/crates/primitives/src/hex_bytes.rs

Lines 361 to 363 in 0149bde

#[test]

fn arbitrary() {

proptest::proptest!(|(bytes: Bytes)| {

)

You're doing the same pattern everywhere:

Generate N accounts

Put them in the DB

RLP encode them + hash them manually

Check equality

same below for storage

I added fuzzed tests for building the whole trie (multiple accounts with storage), that way we test the full functionality.

gakonst · 2023-02-01T01:03:24Z

crates/stages/src/trie/mod.rs

@@ -0,0 +1,581 @@
+use crate::Transaction;


The DB-backed cita_trie impl + loader look good, nice work on the tests. I'm worried about performance here slightly.

At the start we were using parity's trie-db, but we changed it because of root hash mismatching (see paritytech/trie#182), along with non-existent error handling in their database trait. The idea is to change the loader to use ours when it's finished (pending benchmarks, and optionally adding some way to more easily change implementations).

Co-authored-by: lambdaclass-user <[email protected]>

gakonst

good w/ me at a high level. defer to @rakita for final approval. thanks for taking this on, really valuable work.

gakonst · 2023-02-02T01:37:34Z

crates/stages/src/trie/mod.rs

+        let mut cursor = self.tx.cursor_dup_read::<tables::StoragesTrie>()?;
+        Ok(cursor.seek_by_key_subkey(self.key, H256::from_slice(key))?.map(|entry| entry.node))


Let's do after, when we get into performance benching

Co-authored-by: lambdaclass-user <[email protected]>

rakita · 2023-02-03T10:35:57Z

crates/common/rlp/src/encode.rs

@@ -313,6 +313,7 @@ mod ethereum_types_support {

    fixed_revm_uint_impl!(RU128, 16);
    fixed_revm_uint_impl!(RU256, 32);
+    impl_max_encoded_len!(RU256, { length_of_length(32) + 32 });


@joshieDo is this okay, I am not familiar with what it should do?

rakita

lgtm, nice work!

On merkle execute fail, the tree isn't commited to the database, and the unwind fails to find the root of the tree. Co-authored-by: lambdaclass-user <[email protected]>

gakonst · 2023-02-04T02:50:49Z

Thank you for taking this on @MegaRedHand! Let's see how performance is once we get there..

Co-authored-by: lambdaclass-user <[email protected]> Co-authored-by: Francisco Krause Arnim <[email protected]>

MegaRedHand and others added 9 commits January 23, 2023 17:12

Add DBTrieLoader skeleton

7e0410e

Co-authored-by: lambdaclass-user <[email protected]>

Add storage root calculation

2a7fc36

Co-authored-by: lambdaclass-user <[email protected]>

Add test verifying mainnet genesis block

8a88280

Co-authored-by: lambdaclass-user <[email protected]>

Add custom NodeCodec skeleton

fbdb3ac

Co-authored-by: lambdaclass-user <[email protected]>

Implement custom NodeCodec

c57be05

Co-authored-by: lambdaclass-user <[email protected]>

Add tests using cita_trie

e5c9190

Co-authored-by: lambdaclass-user <[email protected]>

Remove allows and fix

423538c

Co-authored-by: lambdaclass-user <[email protected]>

Add root calculation to MerkleStage

8a1e509

Co-authored-by: lambdaclass-user <[email protected]>

Remove unwraps in trie

cce16f4

Co-authored-by: lambdaclass-user <[email protected]>

Add traces to stage

4f2eaa7

Co-authored-by: lambdaclass-user <[email protected]>

rakita reviewed Jan 24, 2023

View reviewed changes

onbjerg added C-enhancement New feature or request A-staged-sync Related to staged sync (pipelines and stages) labels Jan 24, 2023

MegaRedHand and others added 16 commits January 24, 2023 10:11

Add comments

31ca213

Co-authored-by: lambdaclass-user <[email protected]>

Replace unwrap with unreachable

88becea

Co-authored-by: lambdaclass-user <[email protected]>

Add intermediate hashes persistence

d11d4c5

Co-authored-by: lambdaclass-user <[email protected]>

Add incremental trie building

a6cdacb

Co-authored-by: lambdaclass-user <[email protected]>

Add node decoding

571d58e

Co-authored-by: lambdaclass-user <[email protected]>

Add stage tests

6011c4a

Co-authored-by: lambdaclass-user <[email protected]>

Add get_header_by_num to stages' Transaction

2ccd366

Co-authored-by: lambdaclass-user <[email protected]>

Fix compile error

68c89a9

Co-authored-by: lambdaclass-user <[email protected]>

Fix: was not adding accounts to HashedAccounts

5809342

Co-authored-by: lambdaclass-user <[email protected]>

Add check for genesis block

b68de75

Co-authored-by: lambdaclass-user <[email protected]>

Generate old trie on seed_execution

c6fe95f

Co-authored-by: lambdaclass-user <[email protected]>

Change parity's TrieDBMut for cita_trie

40b62ff

Co-authored-by: lambdaclass-user <[email protected]>

Gather changes before updating

b78a1a3

Co-authored-by: lambdaclass-user <[email protected]>

Remove all trie-db stuff

1a4c159

Co-authored-by: lambdaclass-user <[email protected]>

Add unwind, update stage type struct, add tests

09fbb0e

Modify test seeding

2c7fe54

Co-authored-by: lambdaclass-user <[email protected]>

rakita reviewed Jan 31, 2023

View reviewed changes

crates/stages/src/trie/mod.rs Outdated Show resolved Hide resolved

MegaRedHand and others added 2 commits January 31, 2023 13:07

Change walk for walk_range

39cbe91

Co-authored-by: lambdaclass-user <[email protected]>

gather_changes hashes content at the end

a41f050

Co-authored-by: lambdaclass-user <[email protected]>

rakita reviewed Jan 31, 2023

View reviewed changes

crates/stages/src/stages/merkle.rs Outdated Show resolved Hide resolved

Remove transition id incrementing

8e3ba6e

Co-authored-by: lambdaclass-user <[email protected]>

gakonst requested changes Feb 1, 2023

View reviewed changes

MegaRedHand and others added 7 commits February 1, 2023 10:40

Use cita-trie's TrieError in our TrieError

ce419d1

Co-authored-by: lambdaclass-user <[email protected]>

Add comments and remove unwraps

e8efafb

Co-authored-by: lambdaclass-user <[email protected]>

Remove dbg!

4ae6f9e

Co-authored-by: lambdaclass-user <[email protected]>

Fix: was encoding values wrong

4d75437

Co-authored-by: lambdaclass-user <[email protected]>

Fix: hashing stages weren't hashing genesis accs

f1e2964

Co-authored-by: lambdaclass-user <[email protected]>

Appease clippy

8de7eab

Co-authored-by: lambdaclass-user <[email protected]>

Add fuzzing test for trie

401b8dd

Co-authored-by: lambdaclass-user <[email protected]>

gakonst approved these changes Feb 2, 2023

View reviewed changes

MegaRedHand and others added 4 commits February 2, 2023 10:54

Fix and update docs and comments

9fe2b4f

Co-authored-by: lambdaclass-user <[email protected]>

Add defaults for MerkleStage

acf7e48

Co-authored-by: lambdaclass-user <[email protected]>

Change new_with_root to from_root

77618d7

Co-authored-by: lambdaclass-user <[email protected]>

Fix storage value encoding

82dc1f3

Co-authored-by: lambdaclass-user <[email protected]>

MegaRedHand mentioned this pull request Feb 2, 2023

fix: missing hashed storage entries #1143

Merged

rakita reviewed Feb 3, 2023

View reviewed changes

rakita approved these changes Feb 3, 2023

View reviewed changes

MegaRedHand and others added 2 commits February 3, 2023 10:09

Fix: merkle execute fails caused unwind fails

ef2edfe

On merkle execute fail, the tree isn't commited to the database, and the unwind fails to find the root of the tree. Co-authored-by: lambdaclass-user <[email protected]>

Merge branch 'main' into merkle-stage

a240b6a

MegaRedHand force-pushed the merkle-stage branch from 481b60c to a240b6a Compare February 3, 2023 13:20

gakonst merged commit fd7dc11 into paradigmxyz:main Feb 4, 2023

MegaRedHand deleted the merkle-stage branch February 4, 2023 03:11

literallymarvellous pushed a commit to literallymarvellous/reth that referenced this pull request Feb 6, 2023

feat(sync): MerkleStage (paradigmxyz#994)

37a9275

Co-authored-by: lambdaclass-user <[email protected]> Co-authored-by: Francisco Krause Arnim <[email protected]>

ensi321 pushed a commit to ensi321/reth that referenced this pull request Feb 7, 2023

feat(sync): MerkleStage (paradigmxyz#994)

11c0af9

Co-authored-by: lambdaclass-user <[email protected]> Co-authored-by: Francisco Krause Arnim <[email protected]>

MegaRedHand mentioned this pull request Feb 8, 2023

Add benchmarks for MerkleStage #1234

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sync): `MerkleStage` #994

feat(sync): `MerkleStage` #994

MegaRedHand commented Jan 23, 2023

codecov-commenter commented Jan 23, 2023 •

edited

Loading

rakita Jan 24, 2023

MegaRedHand Jan 24, 2023

rakita Jan 24, 2023

MegaRedHand Jan 24, 2023

gakonst left a comment

gakonst Jan 31, 2023

MegaRedHand Feb 1, 2023

gakonst Jan 31, 2023

MegaRedHand Feb 1, 2023

gakonst Jan 31, 2023

gakonst Jan 31, 2023

MegaRedHand Feb 2, 2023

gakonst Feb 1, 2023

MegaRedHand Feb 1, 2023

gakonst Feb 1, 2023

MegaRedHand Feb 1, 2023

gakonst Feb 2, 2023

gakonst Feb 1, 2023

MegaRedHand Feb 1, 2023

gakonst Feb 1, 2023

MegaRedHand Feb 1, 2023

gakonst Feb 1, 2023

MegaRedHand Feb 2, 2023

gakonst left a comment •

edited

Loading

gakonst Feb 2, 2023

rakita Feb 3, 2023

rakita left a comment

gakonst commented Feb 4, 2023

		info!(target: "sync::stages::merkle::exec", "Stage finished");
		Ok(ExecOutput { stage_progress: input.previous_stage_progress(), done: true })

	tx.get::<tables::AccountsTrie>(root)?.ok_or(TrieError::MissingRoot(root))?;
	tx.get::<tables::AccountsTrie>(root)?.ok_or(TrieError::MissingRoot(root))?;

		let mut cursor = self.tx.cursor_dup_read::<tables::StoragesTrie>()?;
		Ok(cursor.seek_by_key_subkey(self.key, H256::from_slice(key))?.map(\|entry\| entry.node))

		// let mut walker = storage_cursor.walk_dup(address, H256::zero())?;
		let mut current = storage_cursor.seek_by_key_subkey(address, H256::zero())?;

	#[test]
	fn arbitrary() {
	proptest::proptest!(\|(bytes: Bytes)\| {

feat(sync): MerkleStage #994

feat(sync): MerkleStage #994

Conversation

MegaRedHand commented Jan 23, 2023

codecov-commenter commented Jan 23, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gakonst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gakonst left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rakita left a comment

Choose a reason for hiding this comment

gakonst commented Feb 4, 2023

feat(sync): `MerkleStage` #994

feat(sync): `MerkleStage` #994

codecov-commenter commented Jan 23, 2023 •

edited

Loading

gakonst left a comment •

edited

Loading