Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Fast sync #8884

Merged
38 commits merged into from
Jun 22, 2021
Merged

Fast sync #8884

38 commits merged into from
Jun 22, 2021

Conversation

arkpar
Copy link
Member

@arkpar arkpar commented May 22, 2021

Introduces fast sync mode which works as follows:

  1. Download and finalize header chain.
  2. Pick a recent finalized block.
  3. Download chunks containing pieces of state with merkle proofs.
  4. Import whole state into the DB.
  5. Re-download all the blocks starting from the imported state and execute them.

Current limitations:

  1. Child tries are not currently supported.
  2. Once the header-only chain is long enough, it is impossible to catch up with regular sync starting with an old block or genesis. As far as I understand, BABE epochs become unavailable."Could not fetch epoch .." error. This can can probably be fixed by fetching epoch info from the state, or skipping epoch verification for existing headers altogether.
  3. State is accumulated in memory. Should be fine for a while. Polkadot state is about 250Mb at the moment. Should be eventually changed to a temporary disk DB though.

polkadot companion: paritytech/polkadot#3078

@arkpar arkpar added A0-please_review Pull request needs code review. B5-clientnoteworthy C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit labels May 22, 2021
@arkpar arkpar requested review from andresilva, tomaka and bkchr May 22, 2021 11:44
@arkpar arkpar requested a review from sorpaas as a code owner May 22, 2021 11:44
@arkpar arkpar requested a review from cheme May 22, 2021 11:55
@burdges
Copy link

burdges commented May 22, 2021

What does "chunks" mean here? You want to reconstruct the current state from only recent blocks? Is that possible? Are you talking about not a sync from nothing, but only a partial sync from a previously running node, which only grabs the parts of the blocks with state updates?

@arkpar
Copy link
Member Author

arkpar commented May 22, 2021

@burdges This implements a sync from scratch for the full state. Instead of executing all blocks and building the state locally, we just download a single full state, split into chunks. These are introduced at the network protocol level to prevent downloading too much unverified data.

The network protocol works like this: The syncing node requests a sequence of trie keys for a recent block X starting from some key S along with the combined trie proof for these keys. The serving node is able to iterate over its trie and build such a sequence, stopping once the proof size reaches a certain limit. After receiving a chunk, the syncing node verifies the proof and request the next chunk with S set to the last received key of a previous chunk. After all keys and proofs are downloaded, we just set it as a state for the block X and continue syncing from there normally.

@burdges
Copy link

burdges commented May 23, 2021

We do not care about what changed recently right? We're just iterating over the whole state.

I confused this with another approach using an optimization for partially synced nodes:

We first download the PoV data for a recent block B and update our previous state, checking all Merkle proofs. We also iterate over all Merkle proof copath elements c in the PoV block, if we knew c already in our previous state then we're done there, but if c is unfamiliar then we request a proof for c.

At this point, the server has collected proof requests c so it loops over ancestors B' = B-1,B-2,..,0 of B, sends the proofs for any c requested in B', and aborts when no requested c remain. We then perform the same state update and request a new bunch of c

We cannot afaik do this per se because nobody knows old blocks or which state elements they changed. It adapts to partial history like we possess by sending all leaves under c whenever no more precise change history exists.

I kinda doubt we require such optimizations for the partially synced case though, so likely what you're doing here makes the most sense, but they exist fyi.

@burdges
Copy link

burdges commented May 23, 2021

At some point, we should explore an erasure coded version of this fast sync, so once per day/week parathread pause state updates and produce blocks that scan through their whole state, much like what you're doing here but on-chain. All these blocks get erasure coded into the relay chain's availability store, so not even if all parathread nodes go offline then another parathread node could start up and pick up from the previous erasure coded state.

I think your version here sounds best when the chain trusts its own nodes, like our relay chain. Yet, an erasure coded on-chain version like this helps run long lived system parathreads, upon whose continuation polkadot itself depends.

@arkpar
Copy link
Member Author

arkpar commented May 23, 2021

The use case here is to get the relay chain on a fresh start up to the latest full state as quickly as possible. This is also required for storage chains, where nodes don't even keep the old blocks.

We do not care about what changed recently right? We're just iterating over the whole state.

Right, although not at once. Iteration is split over multiple requests and peers, so no single peer has to iterate the whole state.

At some point, we should explore an erasure coded version of this fast sync, so once per day/week parathread pause state updates and produce blocks that scan through their whole state

This may be prohibitively slow for a large chain. E.g. on Ethereum iterating over the whole state trie takes hours.

For partially synced states another idea is to use some kind of set reconciliation protocol for the trie nodes.

@burdges
Copy link

burdges commented May 23, 2021

This may be prohibitively slow for a large chain. E.g. on Ethereum iterating over the whole state trie takes hours.

Fine. We mostly want this for parathreads that either operate in massively sharded pool or else move specific relay chain functionality off the relay chain. We'd need to balance performance constraints when selecting pool characteristics or similar. Another day.. :)

For partially synced states another idea is to use some kind of set reconciliation protocol for the trie nodes.

We'd scan down the tree from the root here I guess, doing set reconciliation at each level. I described a scheme above that basically does set reconciliation using block numbers, up until we've pruned the blocks. We can explore this stuff if you think we need faster partial sync than simply scanning the whole state. I donno..

Copy link
Contributor

@cheme cheme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first read of the pr. Nice to see it.

Most useful point from my review is that I think we should not attach keys with proofs and just check iteration for proof (we gain size and avoid crafted bad response).

At this point I was as also wondering about flagging as experimental, not sure it is worth it? (I did not look for issue related to switching from block synching to fastsynching or the other way, or possible side effect with pruning or other unexpected race, so I am not overtly confident, but that's on my side).

client/api/src/in_mem.rs Show resolved Hide resolved
client/api/src/backend.rs Show resolved Hide resolved
client/api/src/proof_provider.rs Show resolved Hide resolved
client/cli/src/arg_enums.rs Show resolved Hide resolved
client/consensus/babe/src/lib.rs Outdated Show resolved Hide resolved
let mut current_key = start.as_ref().to_vec();
let mut entries_size = 0;
let mut keys = Vec::new();
let proving_backend = proving_backend::ProvingBackend::<_, H>::new(trie_backend);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also use a trie iterator as in https://github.com/cheme/substrate/blob/cb792cf93e804e091f7a69b5bb1ac67fbfee3408/primitives/state-machine/src/lib.rs#L808 (actually next_storage key does use a trie iterator internally).
But seems correct to use it this way to (proof would be identical).

entries_size += next_key.len();
keys.push(next_key.clone());
let proof_size = proving_backend.estimate_encoded_size();
if entries_size + proof_size > size_limit {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, if removing key from protocol, only proof_size is needed here.

client/network/src/state_request_handler.rs Show resolved Hide resolved
Copy link
Contributor

@tomaka tomaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 in general for doing this.

However, assuming that this PR is merged, and that nodes start using this to sync, then after a while the number of nodes that have block number 1 stored in their database will be lower and lower.

This isn't necessarily a problem in the absolute, but if you were to try sync from scratch you might have trouble finding a node that has block number 1 (and above). Right now we just assume that all full nodes know about all blocks, but in the future either we just don't care about people trying to sync the chain from scratch or we need something a bit smarter.

client/network/src/schema/api.v1.proto Outdated Show resolved Hide resolved
client/network/src/schema/api.v1.proto Outdated Show resolved Hide resolved
client/network/src/schema/api.v1.proto Outdated Show resolved Hide resolved
@arkpar
Copy link
Member Author

arkpar commented May 25, 2021

However, assuming that this PR is merged, and that nodes start using this to sync, then after a while the number of nodes that have block number 1 stored in their database will be lower and lower.

Ideally, after syncing the state, there should be a background process that downloads old block bodies. I'll change this PR to request and store block bodies (but not execute them) along with block headers during the first phase of the sync for now. It will still be fast, although require more bandwidth.

But if this is combined with grandpa warp sync, where we don't even have all the headers, we'll either need a post-sync background download, or we'll have to rely on bootnodes or archive nodes to preserve full history.

I should also note that in case of storage chains, this will be the only way to sync. Block 1 is supposed to be be forgotten by design.

@arkpar arkpar removed the A0-please_review Pull request needs code review. label May 25, 2021
@cheme
Copy link
Contributor

cheme commented Jun 4, 2021

That branch still serves full proofs in state_request_handler.rs.

The changes I made are in read_proof_collection, and verify_range_proof of client/service/src/client/client.rs, should be call.
(I did a bad job with my commits, changes are not easy to spot (I had to sync)).

@arkpar
Copy link
Member Author

arkpar commented Jun 4, 2021

@arkpar , in case you plan to do some size bench or other test, this branch https://github.com/cheme/substrate/tree/a-storage-sync-compact adds compact proof to your PR (not very clean and not really optimized, and not really tested but could be interesting).

223 MiB Compact vs 246Mib Full.

Could you maybe make a follow-up PR after this is merged? This one already has too many changes.

@cheme
Copy link
Contributor

cheme commented Jun 4, 2021

Nice, thanks for testing, it depends on #8574 , but after getting #8574 and this pr in I will do.

@arkpar
Copy link
Member Author

arkpar commented Jun 7, 2021

@bkchr @tomaka Awaiting for your review

Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the huge delay :(

pub enum SyncMode {
Full,
Fast,
FastUnsafe,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it doesn't support doc comments, you can just add some code comment explaining what this does.

@@ -125,6 +126,10 @@ pub struct NetworkParams {
/// Join the IPFS network and serve transactions over bitswap protocol.
#[structopt(long)]
pub ipfs_server: bool,

/// Blockchain syncing mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or add some docs here about the variants. So, that the user can understand them

@@ -1275,7 +1277,12 @@ impl<Block, Client, Inner> BlockImport<Block> for BabeBlockImport<Block, Client,
// early exit if block already in chain, otherwise the check for
// epoch changes will error when trying to re-import an epoch change
match self.client.status(BlockId::Hash(hash)) {
Ok(sp_blockchain::BlockStatus::InChain) => return Ok(ImportResult::AlreadyInChain),
Ok(sp_blockchain::BlockStatus::InChain) => {
// When re-importing existing block strip away finality information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finality information?

client/informant/src/display.rs Outdated Show resolved Hide resolved
client/informant/src/display.rs Outdated Show resolved Hide resolved
storage: Storage,
) -> ClientResult<Block::Hash> {
if storage.top.keys().any(|k| well_known_keys::is_child_storage_key(&k)) {
return Err(sp_blockchain::Error::GenesisInvalid.into());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should we rename this function? Below you also speak about genesis

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh now I see, copy and paste :D So, yeah, we should remove the word "genesis" from this function.

client/db/src/lib.rs Outdated Show resolved Hide resolved
backend.blockchain.update_meta(MetaUpdate {
hash: info.finalized_hash,
number: info.finalized_number,
is_best: info.finalized_hash == info.best_hash,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the requirement for this finalized == best_hash?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finalized is the block we are writing metadata for. So if it is also the best block, is_best is set to true

client/db/src/lib.rs Outdated Show resolved Hide resolved
client/db/src/lib.rs Outdated Show resolved Hide resolved
@arkpar
Copy link
Member Author

arkpar commented Jun 22, 2021

bot merge

@ghost
Copy link

ghost commented Jun 22, 2021

Trying merge.

@ghost ghost merged commit 97338fc into master Jun 22, 2021
@ghost ghost deleted the a-storage-sync branch June 22, 2021 09:32
athei pushed a commit that referenced this pull request Jun 25, 2021
* State sync

* Importing state fixes

* Bugfixes

* Sync with proof

* Status reporting

* Unsafe sync mode

* Sync test

* Cleanup

* Apply suggestions from code review

Co-authored-by: cheme <[email protected]>
Co-authored-by: Pierre Krieger <[email protected]>

* set_genesis_storage

* Extract keys from range proof

* Detect iter completion

* Download and import bodies with fast sync

* Replaced meta updates tuple with a struct

* Fixed reverting finalized state

* Reverted timeout

* Typo

* Doc

* Doc

* Fixed light client test

* Fixed error handling

* Tweaks

* More UpdateMeta changes

* Rename convert_transaction

* Apply suggestions from code review

Co-authored-by: Bastian Köcher <[email protected]>

* Apply suggestions from code review

Co-authored-by: Bastian Köcher <[email protected]>

* Code review suggestions

* Fixed count handling

Co-authored-by: cheme <[email protected]>
Co-authored-by: Pierre Krieger <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants