Fast sync #8884

arkpar · 2021-05-22T11:44:13Z

Introduces fast sync mode which works as follows:

Download and finalize header chain.
Pick a recent finalized block.
Download chunks containing pieces of state with merkle proofs.
Import whole state into the DB.
Re-download all the blocks starting from the imported state and execute them.

Current limitations:

Child tries are not currently supported.
Once the header-only chain is long enough, it is impossible to catch up with regular sync starting with an old block or genesis. As far as I understand, BABE epochs become unavailable."Could not fetch epoch .." error. This can can probably be fixed by fetching epoch info from the state, or skipping epoch verification for existing headers altogether.
State is accumulated in memory. Should be fine for a while. Polkadot state is about 250Mb at the moment. Should be eventually changed to a temporary disk DB though.

polkadot companion: paritytech/polkadot#3078

…ge-sync

burdges · 2021-05-22T15:46:57Z

What does "chunks" mean here? You want to reconstruct the current state from only recent blocks? Is that possible? Are you talking about not a sync from nothing, but only a partial sync from a previously running node, which only grabs the parts of the blocks with state updates?

arkpar · 2021-05-22T18:18:12Z

@burdges This implements a sync from scratch for the full state. Instead of executing all blocks and building the state locally, we just download a single full state, split into chunks. These are introduced at the network protocol level to prevent downloading too much unverified data.

The network protocol works like this: The syncing node requests a sequence of trie keys for a recent block X starting from some key S along with the combined trie proof for these keys. The serving node is able to iterate over its trie and build such a sequence, stopping once the proof size reaches a certain limit. After receiving a chunk, the syncing node verifies the proof and request the next chunk with S set to the last received key of a previous chunk. After all keys and proofs are downloaded, we just set it as a state for the block X and continue syncing from there normally.

burdges · 2021-05-23T07:55:11Z

We do not care about what changed recently right? We're just iterating over the whole state.

I confused this with another approach using an optimization for partially synced nodes:

We first download the PoV data for a recent block B and update our previous state, checking all Merkle proofs. We also iterate over all Merkle proof copath elements c in the PoV block, if we knew c already in our previous state then we're done there, but if c is unfamiliar then we request a proof for c.

At this point, the server has collected proof requests c so it loops over ancestors B' = B-1,B-2,..,0 of B, sends the proofs for any c requested in B', and aborts when no requested c remain. We then perform the same state update and request a new bunch of c

We cannot afaik do this per se because nobody knows old blocks or which state elements they changed. It adapts to partial history like we possess by sending all leaves under c whenever no more precise change history exists.

I kinda doubt we require such optimizations for the partially synced case though, so likely what you're doing here makes the most sense, but they exist fyi.

burdges · 2021-05-23T08:02:51Z

At some point, we should explore an erasure coded version of this fast sync, so once per day/week parathread pause state updates and produce blocks that scan through their whole state, much like what you're doing here but on-chain. All these blocks get erasure coded into the relay chain's availability store, so not even if all parathread nodes go offline then another parathread node could start up and pick up from the previous erasure coded state.

I think your version here sounds best when the chain trusts its own nodes, like our relay chain. Yet, an erasure coded on-chain version like this helps run long lived system parathreads, upon whose continuation polkadot itself depends.

arkpar · 2021-05-23T09:59:51Z

The use case here is to get the relay chain on a fresh start up to the latest full state as quickly as possible. This is also required for storage chains, where nodes don't even keep the old blocks.

We do not care about what changed recently right? We're just iterating over the whole state.

Right, although not at once. Iteration is split over multiple requests and peers, so no single peer has to iterate the whole state.

At some point, we should explore an erasure coded version of this fast sync, so once per day/week parathread pause state updates and produce blocks that scan through their whole state

This may be prohibitively slow for a large chain. E.g. on Ethereum iterating over the whole state trie takes hours.

For partially synced states another idea is to use some kind of set reconciliation protocol for the trie nodes.

burdges · 2021-05-23T11:56:30Z

This may be prohibitively slow for a large chain. E.g. on Ethereum iterating over the whole state trie takes hours.

Fine. We mostly want this for parathreads that either operate in massively sharded pool or else move specific relay chain functionality off the relay chain. We'd need to balance performance constraints when selecting pool characteristics or similar. Another day.. :)

For partially synced states another idea is to use some kind of set reconciliation protocol for the trie nodes.

We'd scan down the tree from the root here I guess, doing set reconciliation at each level. I described a scheme above that basically does set reconciliation using block numbers, up until we've pruned the blocks. We can explore this stuff if you think we need faster partial sync than simply scanning the whole state. I donno..

cheme

I did a first read of the pr. Nice to see it.

Most useful point from my review is that I think we should not attach keys with proofs and just check iteration for proof (we gain size and avoid crafted bad response).

At this point I was as also wondering about flagging as experimental, not sure it is worth it? (I did not look for issue related to switching from block synching to fastsynching or the other way, or possible side effect with pruning or other unexpected race, so I am not overtly confident, but that's on my side).

client/api/src/in_mem.rs

client/api/src/backend.rs

client/api/src/proof_provider.rs

client/cli/src/arg_enums.rs

client/consensus/babe/src/lib.rs

primitives/consensus/common/src/block_import.rs

cheme · 2021-05-24T13:41:01Z

primitives/state-machine/src/lib.rs

+		let mut current_key = start.as_ref().to_vec();
+		let mut entries_size = 0;
+		let mut keys = Vec::new();
+		let proving_backend = proving_backend::ProvingBackend::<_, H>::new(trie_backend);


Could also use a trie iterator as in https://github.com/cheme/substrate/blob/cb792cf93e804e091f7a69b5bb1ac67fbfee3408/primitives/state-machine/src/lib.rs#L808 (actually next_storage key does use a trie iterator internally).
But seems correct to use it this way to (proof would be identical).

cheme · 2021-05-24T13:47:11Z

primitives/state-machine/src/lib.rs

+			entries_size += next_key.len();
+			keys.push(next_key.clone());
+			let proof_size = proving_backend.estimate_encoded_size();
+			if entries_size + proof_size > size_limit {


Note, if removing key from protocol, only proof_size is needed here.

client/network/src/state_request_handler.rs

tomaka

👍 in general for doing this.

However, assuming that this PR is merged, and that nodes start using this to sync, then after a while the number of nodes that have block number 1 stored in their database will be lower and lower.

This isn't necessarily a problem in the absolute, but if you were to try sync from scratch you might have trouble finding a node that has block number 1 (and above). Right now we just assume that all full nodes know about all blocks, but in the future either we just don't care about people trying to sync the chain from scratch or we need something a bit smarter.

client/network/src/schema/api.v1.proto

arkpar · 2021-05-25T09:30:09Z

However, assuming that this PR is merged, and that nodes start using this to sync, then after a while the number of nodes that have block number 1 stored in their database will be lower and lower.

Ideally, after syncing the state, there should be a background process that downloads old block bodies. I'll change this PR to request and store block bodies (but not execute them) along with block headers during the first phase of the sync for now. It will still be fast, although require more bandwidth.

But if this is combined with grandpa warp sync, where we don't even have all the headers, we'll either need a post-sync background download, or we'll have to rely on bootnodes or archive nodes to preserve full history.

I should also note that in case of storage chains, this will be the only way to sync. Block 1 is supposed to be be forgotten by design.

cheme · 2021-06-04T07:56:26Z

That branch still serves full proofs in state_request_handler.rs.

The changes I made are in read_proof_collection, and verify_range_proof of client/service/src/client/client.rs, should be call.
(I did a bad job with my commits, changes are not easy to spot (I had to sync)).

arkpar · 2021-06-04T08:59:14Z

@arkpar , in case you plan to do some size bench or other test, this branch https://github.com/cheme/substrate/tree/a-storage-sync-compact adds compact proof to your PR (not very clean and not really optimized, and not really tested but could be interesting).

223 MiB Compact vs 246Mib Full.

Could you maybe make a follow-up PR after this is merged? This one already has too many changes.

cheme · 2021-06-04T09:52:59Z

Nice, thanks for testing, it depends on #8574 , but after getting #8574 and this pr in I will do.

…ge-sync

arkpar · 2021-06-07T13:48:50Z

@bkchr @tomaka Awaiting for your review

…ge-sync

bkchr

Sorry for the huge delay :(

bkchr · 2021-06-04T11:15:40Z

client/cli/src/arg_enums.rs

+	pub enum SyncMode {
+		Full,
+		Fast,
+		FastUnsafe,


While it doesn't support doc comments, you can just add some code comment explaining what this does.

bkchr · 2021-06-04T11:17:08Z

client/cli/src/params/network_params.rs

@@ -125,6 +126,10 @@ pub struct NetworkParams {
 	/// Join the IPFS network and serve transactions over bitswap protocol.
 	#[structopt(long)]
 	pub ipfs_server: bool,
+
+	/// Blockchain syncing mode.


Or add some docs here about the variants. So, that the user can understand them

bkchr · 2021-06-04T11:18:52Z

client/consensus/babe/src/lib.rs

@@ -1275,7 +1277,12 @@ impl<Block, Client, Inner> BlockImport<Block> for BabeBlockImport<Block, Client,
 		// early exit if block already in chain, otherwise the check for
 		// epoch changes will error when trying to re-import an epoch change
 		match self.client.status(BlockId::Hash(hash)) {
-			Ok(sp_blockchain::BlockStatus::InChain) => return Ok(ImportResult::AlreadyInChain),
+			Ok(sp_blockchain::BlockStatus::InChain) => {
+				// When re-importing existing block strip away finality information.


Finality information?

client/informant/src/display.rs

bkchr · 2021-06-08T10:59:12Z

client/db/src/lib.rs

+		storage: Storage,
+	) -> ClientResult<Block::Hash> {
+		if storage.top.keys().any(|k| well_known_keys::is_child_storage_key(&k)) {
+			return Err(sp_blockchain::Error::GenesisInvalid.into());


Wrong error?

Or should we rename this function? Below you also speak about genesis

Ahh now I see, copy and paste :D So, yeah, we should remove the word "genesis" from this function.

client/db/src/lib.rs

bkchr · 2021-06-11T11:39:42Z

client/db/src/lib.rs

+			backend.blockchain.update_meta(MetaUpdate {
+				hash: info.finalized_hash,
+				number: info.finalized_number,
+				is_best: info.finalized_hash == info.best_hash,


Why is the requirement for this finalized == best_hash?

finalized is the block we are writing metadata for. So if it is also the best block, is_best is set to true

client/db/src/lib.rs

Co-authored-by: Bastian Köcher <[email protected]>

…ge-sync

arkpar · 2021-06-22T09:32:38Z

bot merge

ghost · 2021-06-22T09:32:43Z

Trying merge.

* State sync * Importing state fixes * Bugfixes * Sync with proof * Status reporting * Unsafe sync mode * Sync test * Cleanup * Apply suggestions from code review Co-authored-by: cheme <[email protected]> Co-authored-by: Pierre Krieger <[email protected]> * set_genesis_storage * Extract keys from range proof * Detect iter completion * Download and import bodies with fast sync * Replaced meta updates tuple with a struct * Fixed reverting finalized state * Reverted timeout * Typo * Doc * Doc * Fixed light client test * Fixed error handling * Tweaks * More UpdateMeta changes * Rename convert_transaction * Apply suggestions from code review Co-authored-by: Bastian Köcher <[email protected]> * Apply suggestions from code review Co-authored-by: Bastian Köcher <[email protected]> * Code review suggestions * Fixed count handling Co-authored-by: cheme <[email protected]> Co-authored-by: Pierre Krieger <[email protected]> Co-authored-by: Bastian Köcher <[email protected]>

arkpar added 9 commits May 21, 2021 17:11

State sync

c7d1ad8

Importing state fixes

224ec1f

Bugfixes

eeda74e

Sync with proof

1722f9b

Status reporting

c3e378c

Unsafe sync mode

e755467

Merge branch 'master' of github.com:paritytech/substrate into a-stora…

e610196

…ge-sync

Sync test

783e794

Cleanup

b02b515

arkpar added A0-please_review Pull request needs code review. B5-clientnoteworthy C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit labels May 22, 2021

arkpar requested review from andresilva, tomaka and bkchr May 22, 2021 11:44

arkpar requested a review from sorpaas as a code owner May 22, 2021 11:44

arkpar requested a review from cheme May 22, 2021 11:55

github-actions bot added the A7-needspolkadotpr label May 22, 2021

arkpar mentioned this pull request May 22, 2021

Fast sync companion PR paritytech/polkadot#3078

Merged

cheme reviewed May 24, 2021

View reviewed changes

tomaka reviewed May 25, 2021

View reviewed changes

client/network/src/schema/api.v1.proto Outdated Show resolved Hide resolved

client/network/src/schema/api.v1.proto Outdated Show resolved Hide resolved

client/network/src/schema/api.v1.proto Outdated Show resolved Hide resolved

arkpar removed the A0-please_review Pull request needs code review. label May 25, 2021

Rename convert_transaction

2f07284

arkpar mentioned this pull request Jun 4, 2021

Compact proof utilities in sp_trie. #8574

Merged

arkpar added 2 commits June 4, 2021 15:26

Merge branch 'master' of github.com:paritytech/substrate into a-stora…

5a99ef1

…ge-sync

Merge branch 'master' of github.com:paritytech/substrate into a-stora…

13a68e4

…ge-sync

github-actions bot removed the A7-needspolkadotpr label Jun 5, 2021

cheme mentioned this pull request Jun 11, 2021

Inner hashing of value in state trie (chainspec versioning). #8931

Closed

Merge branch 'master' of github.com:paritytech/substrate into a-stora…

08079f4

…ge-sync

github-actions bot added the A7-needspolkadotpr label Jun 16, 2021

bkchr approved these changes Jun 16, 2021

View reviewed changes

arkpar and others added 6 commits June 17, 2021 11:30

Apply suggestions from code review

fb882d4

Co-authored-by: Bastian Köcher <[email protected]>

Apply suggestions from code review

21c8c35

Co-authored-by: Bastian Köcher <[email protected]>

Code review suggestions

6e5b91c

Merge branch 'master' of github.com:paritytech/substrate into a-stora…

47583d9

…ge-sync

Fixed count handling

7ff27dd

Merge branch 'master' of github.com:paritytech/substrate into a-stora…

fe1d210

…ge-sync

ghost merged commit 97338fc into master Jun 22, 2021

ghost deleted the a-storage-sync branch June 22, 2021 09:32

tomaka mentioned this pull request Jun 22, 2021

Backport fast sync to full node paritytech/smoldot#974

Closed

pepyakin mentioned this pull request Jun 28, 2021

Update Substrate & Polkadot paritytech/cumulus#519

Merged

github-actions bot mentioned this pull request Jul 8, 2021

Update substrate/polkadot/cumulus from v0.9.7 to v0.9.8 moonbeam-foundation/moonbeam#586

Closed

liuchengxu mentioned this pull request Nov 26, 2021

block.body definitly exists unless it's a light client or using fast sync in storage chain mode autonomys/subspace#153

Merged

danforbes mentioned this pull request Feb 2, 2022

Investigate Substrate Fast-Sync and Possible Gossamer Implementation ChainSafe/gossamer#2264

Closed

liuchengxu mentioned this pull request Aug 26, 2024

Why request block body for fast sync? paritytech/polkadot-sdk#5406

Open

2 tasks

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast sync #8884

Fast sync #8884

arkpar commented May 22, 2021 •

edited

Loading

burdges commented May 22, 2021

arkpar commented May 22, 2021

burdges commented May 23, 2021 •

edited

Loading

burdges commented May 23, 2021 •

edited

Loading

arkpar commented May 23, 2021

burdges commented May 23, 2021

cheme left a comment •

edited

Loading

cheme May 24, 2021

cheme May 24, 2021

tomaka left a comment

arkpar commented May 25, 2021 •

edited

Loading

cheme commented Jun 4, 2021

arkpar commented Jun 4, 2021

cheme commented Jun 4, 2021

arkpar commented Jun 7, 2021

bkchr left a comment

bkchr Jun 4, 2021

bkchr Jun 4, 2021

bkchr Jun 4, 2021

bkchr Jun 8, 2021

bkchr Jun 8, 2021

bkchr Jun 8, 2021

bkchr Jun 11, 2021

arkpar Jun 17, 2021

arkpar commented Jun 22, 2021

ghost commented Jun 22, 2021

Fast sync #8884

Fast sync #8884

Conversation

arkpar commented May 22, 2021 • edited Loading

burdges commented May 22, 2021

arkpar commented May 22, 2021

burdges commented May 23, 2021 • edited Loading

burdges commented May 23, 2021 • edited Loading

arkpar commented May 23, 2021

burdges commented May 23, 2021

cheme left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomaka left a comment

Choose a reason for hiding this comment

arkpar commented May 25, 2021 • edited Loading

cheme commented Jun 4, 2021

arkpar commented Jun 4, 2021

cheme commented Jun 4, 2021

arkpar commented Jun 7, 2021

bkchr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arkpar commented Jun 22, 2021

ghost commented Jun 22, 2021

arkpar commented May 22, 2021 •

edited

Loading

burdges commented May 23, 2021 •

edited

Loading

burdges commented May 23, 2021 •

edited

Loading

cheme left a comment •

edited

Loading

arkpar commented May 25, 2021 •

edited

Loading