Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Parachain Warp Sync #1619

Closed
bkchr opened this issue Sep 13, 2022 · 31 comments
Closed

Parachain Warp Sync #1619

bkchr opened this issue Sep 13, 2022 · 31 comments
Assignees
Labels
J0-enhancement An additional feature request.

Comments

@bkchr
Copy link
Member

bkchr commented Sep 13, 2022

Currently we don't support warp sync in parachains. We will need to write some Warp sync implementation for Parachains. The idea is that we first warp sync the relay chain and then get the latest header of the Parachain. After we got this header we can use StateSync on the Parachain side to get the state.

Depending on how the relay chain node is integrated, we can do the relay chain warp sync differently:

  • In process relay chain node: Do a normal warp sync and then wait for the state sync to finish to extract the Parachain header. This process could be made faster by get the latest Parachain header through a storage proof from a relay chain node that already has the state, but this is more an optimization step.
  • Out of process relay chain node: We would probably wait until the node is finished with syncing and then extract the latest Parachain header. While waiting we should probably print something like "Waiting for connected relay chain node to finish syncing".
@bkchr bkchr added the J0-enhancement An additional feature request. label Sep 13, 2022
@bkchr bkchr added this to SDK Node Sep 13, 2022
@bkchr bkchr moved this to Backlog 🗒 in SDK Node Sep 13, 2022
@samelamin
Copy link
Contributor

@skunert Have you started work on this? I would love the chance to contribute if possible 😄

@skunert
Copy link
Contributor

skunert commented Nov 3, 2022

I have not yet started working on this, feel free to give it a try of you want!

@samelamin
Copy link
Contributor

Awesome, thanks @skunert.

I am on element if you want to have a chat about it. @samelamin:matrix.org

@samelamin
Copy link
Contributor

@bkchr @skunert can you give me some guidance on where to start with this?

looking at how the relay chain does warp I am not sure we can replicate the same

But from Basti's comments I am not sure we are meant to replicate it anyways.

So what does an ideal implementation for this ticket look like? A new component similar to how it's done on substrate?

Is there any other implementation you would want me to look at to keep things consistent?

@bkchr
Copy link
Member Author

bkchr commented Nov 3, 2022

The best is that you make yourself familiar with the grandpa warp sync implementation in Substrate. The idea in the end will probably be around making the warp sync implementation generic. Yeah, the best is really to read the code and try to come up with a proposal. We don't have any concrete implementation in our minds.

@samelamin
Copy link
Contributor

Hey folks I have been looking at the grandpa implementation of warp sync on substrate and on Polkadot and I have a few questions

When Basti says "extract the Parachain header." How do I do that? Is there an example I can reference of how we use it with StateSync?

How will it differ compared to the current fast sync?

perhaps @arkpar or @skunert can give me some pointers?

@bkchr
Copy link
Member Author

bkchr commented Nov 8, 2022

When Basti says "extract the Parachain header." How do I do that?

async fn parachain_head_at(
&self,
at: PHash,
para_id: ParaId,
) -> RelayChainResult<Option<Vec<u8>>> {
self.persisted_validation_data(at, para_id, OccupiedCoreAssumption::TimedOut)
.await
.map(|s| s.map(|s| s.parent_head.0))
}
this is some code doing this.

Is there an example I can reference of how we use it with StateSync?

No.

How will it differ compared to the current fast sync?

Parachain Warp Sync will not be really "warp sync". Aka it doesn't follow the validator set changes etc, because we don't need to do this for parachains. Parachains get their consensus from the relay chain. So, we need the relay chain to warp sync to the latest finalized block and then get the parachain header from the relay chain block. This parachain header is then feed into StateSync as the header of the block we want to sync the state for. This is the basically fast sync, but we give it the block to sync the state for. This will require some changes to the current sync code to make it possible to instruct it on which block to sync for. How to integrate there is part of this issue. The best is that you make yourself familiar with the sync code on the surface to see how this could be plugged in there and then come back with a proposal on how you would like to do this. This proposal can already be "code" or just some high level explanation on how. Then we can discuss this here. You can always ask further questions in between :)

@samelamin
Copy link
Contributor

samelamin commented Nov 8, 2022

Thank you so much Basti this is great help and gives me a better idea where to start

Iim really keen on contributing and building, because I feel like that's how I can give the most value to the ecosystem rather than just helping you design it. Plus it gets me even more familiar with substrate and grandpa which is a good thing in my book

I'll try to get a test around it because I think that's the best way for me to understand the flow

@samelamin
Copy link
Contributor

samelamin commented Nov 16, 2022

Hey guys I have been looking at this in depth over the past week and I have a few questions

I decided to split the work into 2 sections

  1. Identify when the relay chain data has finished syncing
  2. Extract header of parachin to feed into StateSync

The plan is to get 1 done first

Looking into grandpa, The sync works by defining a Network Provider

https://github.com/paritytech/substrate/blob/master/client/finality-grandpa/src/warp_proof.rs#L254-L266

While the default implementation for this is a Warp Sync Provider

https://github.com/paritytech/substrate/blob/master/client/finality-grandpa/src/warp_proof.rs#L270-L286

The WarpSyncProvider then has 2 functions to generate and verify a proof, This uses WarpSyncProof

https://github.com/paritytech/substrate/blob/master/client/finality-grandpa/src/warp_proof.rs#L77-L98

Based on my understanding and from the helpful comments above, we don't need all of this because we - in Cumulus template node - will already know when the relay chain is synced from the relay_chain_interface function is_major_syncing

async fn is_major_syncing(&self) -> RelayChainResult<bool>;

So here is what I was thinking of implementing

we could create a ParachainWarpProvider that takes in relay_chain_interface and waits for the relaychain to sync up then once it's done get the parachain header and then feed it to StateSync

My question is should we be implementing it via WarpSyncProvider to keep it consistent with the polkadot implementation? Do we need a WarpProof and generating/verifying implementations

Or have I gone down the wrong rabbit hole and we should instead be building our own ChainSync that doesn't require warp proof or a provider

https://github.com/paritytech/substrate/blob/a7ba55d3c54b9957c242f659e02f5b5a0f47b3ff/client/network/sync/src/lib.rs#L396-L406

@bkchr
Copy link
Member Author

bkchr commented Nov 18, 2022

Or have I gone down the wrong rabbit hole and we should instead be building our own ChainSync that doesn't require warp proof or a provider

No we don't want this.

My question is should we be implementing it via WarpSyncProvider to keep it consistent with the polkadot implementation? Do we need a WarpProof and generating/verifying implementations

We don't need to generate or verify any warp proofs as we already know what the best block is. You need to change the sync code to support this kind of "special" sync mode here. A sync mode that gets a header set and then starts state sync for this header.

@samelamin
Copy link
Contributor

samelamin commented Nov 20, 2022

You need to change the sync code to support this kind of "special" sync mode here. A sync mode that gets a header set and then starts state sync for this header.

I get that but cumulus does not contain any sync code, so do you mean to change the substrate sync client?

I would think we would need to change the WarpSyncProvider implementation so it supports this special sync mode? At the moment this trait has 3 members generate, verify , current_authorities so perhaps this new sync mode needs to be added there, a new member that takes in the header?

https://github.com/paritytech/substrate/blob/master/client/network/common/src/sync/warp.rs#L41-L58

Or perhaps there is a way of doing this only on Cumulus without changing the underlying substrate code, but given this trait is whats then used when building the network.......

https://github.com/paritytech/cumulus/blob/master/parachain-template/node/src/service.rs#L200-L211

Also, since we don't need to generate or verify proofs, my gut is telling me that we shouldn't even be implementing the current WarpSyncProvider and instead, we need a new generic GeneralWarpSyncProvider ?

@bkchr
Copy link
Member Author

bkchr commented Nov 20, 2022

Also, since we don't need to generate or verify proofs, my gut is telling me that we shouldn't even be implementing the current WarpSyncProvider and instead, we need a new generic GeneralWarpSyncProvider ?

Yes, but we should probably call it different. Maybe the easiest way is just to have some other kind of "sync implementation". Alongside warp/fast/state.

@samelamin
Copy link
Contributor

samelamin commented Nov 20, 2022

Agreed, Ill just go with GeneralWarpSyncProvider for now and then we can rename it later

We can either create a base trait called GeneralWarpSyncProvider and then have the WarpSyncProvider inherit from it and include the 3 members as well as a new Sync Provider e.g. ParachainWarpSyncProvider

Or change BuildNetworkParams to accept a new type of sync which could be WarpSyncProvider or ParachainWarpSyncProvider and then the implementation change based on the type provided. i.e. a new parameter

In your opinion, which is the "cleaner" approach?

@bkchr
Copy link
Member Author

bkchr commented Nov 20, 2022

We can either create a base trait called GeneralWarpSyncProvider and then have the WarpSyncProvider inherit from it and include the 3 members as well as a new Sync Provider e.g. ParachainWarpSyncProvider

I don't get what you mean here.

@bkchr
Copy link
Member Author

bkchr commented Nov 20, 2022

I mean in general we could probably write some sort of abstract WarpSyncProvider that works for us here and for grandpa. I don't exactly know how the current trait looks like and what would need to be changed. Maybe that is easy enough.

@samelamin
Copy link
Contributor

samelamin commented Nov 20, 2022

We can either create a base trait called GeneralWarpSyncProvider and then have the WarpSyncProvider inherit from it and include the 3 members as well as a new Sync Provider e.g. ParachainWarpSyncProvider

I don't get what you mean here.

I mean have a base trait called SyncProvider then have WarpSyncProvider inherit from it and include the 3 members it currently has. At the same time, we have a ParachainWarpSyncProvider that inherits from SyncProvider too
i.e.

SyncProvider > WarpSyncProvider
SyncProvider > ParachainWarpSyncProvider

Then the changes we need to implement will be in BuildNetworkParams

https://github.com/paritytech/substrate/blob/master/client/service/src/builder.rs#L762-L770

Where the parameter we pass in will be a SyncProvider. If its a WarpSyncProvider do what its currently doing, but handle things differently if it's a ParachainWarpSyncProvider

@arkpar
Copy link
Member

arkpar commented Nov 21, 2022

As far as I understand it will be basically WarpSync that skips the first phase, proofs downloading and verification. It just needs to get the target block, which can be passed to the constructor as an optional parameter.
The second issue is that the sync should start once this target block becomes known. This could be handled on the ChainSync level, much like it currently waits for 3 peers before starting warp sync. It can poll a channel or simply wait for the target block to be set with a method call on the network service.

@samelamin
Copy link
Contributor

It just needs to get the target block, which can be passed to the constructor as an optional parameter.

What do you mean by this? Which constructor?

I originally thought it should be part of ChainSync but based on my understanding of Basti's comments I went down the custom WarpSyncProvider. Perhaps I misunderstood?

Should we just be passing in an optional parameter to BuildNetworkParams and subsequently ChainSync?

@arkpar
Copy link
Member

arkpar commented Nov 21, 2022

What do you mean by this? Which constructor?

WarpSync::new

WarpSyncProvider is really intended for providing and verifying warp proofs. To skip all that entirely, I don't really see the need to change it or invent other traits. It just need to be made optional and there needs to be a way to set the target sync block externally. The simplest way I can think of is that the parachain sync starts disabled or in some kind of "idle" mode. And once the block is known, a method is called on the network service, e.g. start_warp_sync(header) which creates the WarpSync object and starts the process.

@bkchr
Copy link
Member Author

bkchr commented Nov 21, 2022

@samelamin here: https://github.com/paritytech/substrate/blob/59a74b18df148b9b4b460ba1d0322948f96d04d0/client/service/src/builder.rs#L733-L734

You could change this to some kind of enum:

enum WarpSyncParams<Block> {
    WithProvider(Arc<dyn WarpSyncProvider<Block>),
    WaitForTarget(futures::channel::oneshot::Receiver<Block::Header>),
    None
}

This params enum would then be passed to the WarpSync::new function. On parachains we would set params to WaitForTarget. This channel sender would then be passed to some component that informs sync about the target block when we are aware of it, aka we asked the relay chain for the latest finalized parachain block as already discussed.

The internals of WarpSync would then take care to handle the different ways of getting to the target block.

@samelamin
Copy link
Contributor

Got it Basti thanks

This is what I am intending to do

  1. Change the warp_sync to an enum to let BuildNetworkParams know this is a different type of sync

  2. Pass that enum over the ChainSync

  3. Add a start_warp_sync(header) function to the network service ( since we will need time to identify the parachain header)

  4. On the Cumulus side, Follow the relay chain using relay_chain_interface once the relay chain is synced up (is_major_syncing should return false) we extract the parachain header using parachain_head_at and call start_warp_sync(header)

  5. Call WarpSync and handle the syncing internally (will get more info on this once the above is built)

Does the above seem like a good approach?

@bkchr
Copy link
Member Author

bkchr commented Nov 21, 2022

3. Add a start_warp_sync(header) function to the network service ( since we will need time to identify the parachain header)

You don't need this. As I explained above, you would pass the channel for sending the header on initialization.

@altonen
Copy link
Contributor

altonen commented Nov 21, 2022

I just wanted to give heads up that I have one big PR coming hopefully by the end of this week that will move the syncing out of sc-network and it will most likely heavily conflict with whatever changes are made here to Substrate's syncing/networking code.

@arkpar
Copy link
Member

arkpar commented Nov 21, 2022

This is what I am intending to do

Looks good to me.

I'd prefer an explicit method rather than a channel, but that's not a strong opinion. Channel is also OK

@bkchr
Copy link
Member Author

bkchr commented Nov 21, 2022

This is what I am intending to do

Looks good to me.

I'd prefer an explicit method rather than a channel, but that's not a strong opinion. Channel is also OK

We can also put a function into the sync handle. However, as @altonen already hinted, putting this function into NetworkService is wrong.

@samelamin
Copy link
Contributor

samelamin commented Nov 21, 2022

Quick question on this that occurred to me while editing WarpSync

Since we are passing this enum, doesn't that mean that we will have to make WarpSyncProvider and the phase optional , because for the new sync type both warp_sync_provider and phase will be None

for example

/// Warp sync state machine. Accumulates warp proofs and state.
pub struct WarpSync<B: BlockT, Client> {
    phase: Option<Phase<B, Client>>,
    client: Arc<Client>,
    warp_sync_provider: Option<Arc<dyn WarpSyncProvider<B>>>,
    total_proof_bytes: u64,
    future_header: Option<oneshot::Receiver<<B>::Hash>>,
}

Or should we be removing the warp_sync_provider and replace it with the enum since it includes the provider?

@bkchr
Copy link
Member Author

bkchr commented Nov 22, 2022

@samelamin these are now really implementation details. Please bring up a pr with a good code design. For sure you will need to make the provider optional, but how, there are multiple ways.

@arkpar
Copy link
Member

arkpar commented Nov 22, 2022

Or should we be removing the warp_sync_provider and replace it with the enum since it includes the provider?

The simplest solution is to simply move it to the Phase::WarpProof variant, as it is not really used in other phases.

Phase can't be None however. It's just that you skip Phase::WarpProof and start with Phase::TargetBlock if you don't need to download proofs.

@samelamin
Copy link
Contributor

Hello folks

I submitted prs for the first part of this Issue which is passing warp_sync_params to include either a WarpSyncProvier or a future block

I will now work on the cumulus part to wait on the relay chain to finish syncing and pass the header to the onshot receiver on warp_sync_params

@Polkadot-Forum
Copy link

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/polkadot-release-analysis-v0-9-39/2277/1

@bkchr
Copy link
Member Author

bkchr commented Apr 4, 2023

Finished! :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J0-enhancement An additional feature request.
Projects
Status: done
Development

No branches or pull requests

6 participants