Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EIP-4844: Remove BlobsSidecarsByRange Req-Resp #3087

Open
Inphi opened this issue Nov 8, 2022 · 15 comments
Open

EIP-4844: Remove BlobsSidecarsByRange Req-Resp #3087

Inphi opened this issue Nov 8, 2022 · 15 comments
Labels
Deneb was called: eip-4844

Comments

@Inphi
Copy link
Contributor

Inphi commented Nov 8, 2022

The original usecase for BlobsSidecarByRange RPC was to allow nodes sync blobs that missed the initial broadcast. We no longer need to do this with Coupled Beacon Blocks. Historical sync can continue to sync (eip-4844) blocks as it works today because they now contain blob sidecars.

There is one other use for BlobsSidecarByRange which is so light clients can retrieve blobs sidecars only that are older than MIN_EPOCHS_FOR_BLOBS_SIDECARS_REQUESTS. This usecase is pretty niche though and such clients can always use the blocks RPC, even if that means potentially receiving dup blocks. Hence, I propose that we remove the RPC altogether and keep the spec minimal.

@terencechain

@terencechain
Copy link
Contributor

We still need a way to historical sync or backtrack sync "missing" blobs with a weak subjectivity state. Are you implying that we add a BeaconBlockAndBlobsSidecarByRange RPC?

@tbenr
Copy link
Contributor

tbenr commented Nov 8, 2022

We could make BeaconBlockAndBlobsSidecarByRange to return a ZERO BlobsSidecar if range includes slots prior to MIN_EPOCHS_FOR_BLOBS_SIDECARS_REQUESTS, Instead of failing the request and retry with BeaconBlocksByRange.

@Inphi
Copy link
Contributor Author

Inphi commented Nov 8, 2022

Ah I see. I misread the spec, It was my understanding that the BeaconBlocksByRange RPC would return coupled beacon block after the upgrade.
But also we should still consider removing the RPC and like you say add a new RPC that returns both.

@realbigsean
Copy link
Contributor

realbigsean commented Nov 8, 2022

We could make BeaconBlockAndBlobsSidecarByRange to return a ZERO BlobsSidecar if range includes slots prior to MIN_EPOCHS_FOR_BLOBS_SIDECARS_REQUESTS, Instead of failing the request and retry with BeaconBlocksByRange.

We've separately been coming to a similar conclusion in lighthouse.

I think we were initially planning to have a separate BeaconBlockByRange and BlobsSidecarByRange request. But @divagant-martian made the point that if we only ever want a block by itself or a block and block together, we should structure our requests to reflect that, both a BeaconBlockByRange and a BeaconBlockAndBlobsSidecarByRange. One for use within the range we're required to have blobs and one for use outside of that range. Making a BlobsSidecarByRange doesn't really make any sense if there are no scenarios where we want just a blob.

@terencechain
Copy link
Contributor

@realbigsean I agree. I think we should just have BeaconBlockByRange and BeaconBlockAndBlobsSidecarByRange

@protolambda
Copy link
Contributor

We definitely need some form of historical sync req-resp method. The BlobsSidecarByRange would allow for blobs to be fetched in parallel to the beacon-blocks. Fusing the methods together BeaconBlockAndBlobsSidecarByRange would simplify the sync, but limit optimizations.

@Inphi Inphi changed the title EIP-4844: Remove BlobsSidecarsByRange RPC EIP-4844: Remove BlobsSidecarsByRange Req-Resp Nov 8, 2022
@tbenr
Copy link
Contributor

tbenr commented Nov 8, 2022

The problem with having only BeaconBlockAndBlobsSidecarByRange is the (edge?) case in which the peer have early pruned the Blobs and responds with block and ZERO BlobsSidecar. So you fall in the decoupled case in which you have to fill up the blobs for known blocks. Retrieving them using BlobsSidecarByRange opens to potential block-blobs inconsistency so the best way to do that would be by BlobsSidecarByRoot (but we might be outside the "recent chain" to which we want to limit the usage of the byRoot option, as mentioned in the today's call).

A naive option would be to discard the BeaconBlockAndBlobsSidecarByRange responses containing ZERO Blobs where you actually expect blobs, and redoing BeaconBlockAndBlobsSidecarByRange towards other peers. This is a meaningful waste of bandwidth only when there are many dishonest validators.

@djrtwo
Copy link
Contributor

djrtwo commented Nov 9, 2022

I think coupling on historic sync adds more technical debt than keeping them together due to having to rework something that is tried and true (block sync) into something that looks very different than what we expect the future to hold

Reasons being:

  1. disrupting block (no blobs) sync in a manner that does not look like what we expect the future to hold
  2. having to deal with the edge case of alternate pruning depth between Blocks and Blobs as native to block sync
  3. likely changing block sync range strategies due to the much larger blobs attached to them (e.g. getting 100 at a time might not be tenable to request to a peer)

That said, the protocol technical debt is not huge. I suspect the debt to manifest more in the engineering so will step down if the general engineering consensus is counter to my view.

@realbigsean
Copy link
Contributor

I don't think it's possible to leave blob syncing untouched within the blob prune depth because block validation is now dependent on blobs, so we have to fit blobs into the picture somehow. With decoupling, we're introducing scenarios where a blob could show up before or after a blob, so you'd have to cache blocks and blobs and probably figure out a way to re-request either if either doesn't show up. And then you have to figure out how to attribute faults if any.

More generally, decoupling introduces asynchrony (which could be good, but is just more complicated to engineer and changes block sync code more) and the possibility of inconsistency. We could address these two things in a decoupled approach by:

  1. always making both requests at once
  2. always making both requests to the same peer
  3. if one fails, both fail

But this starts to look very much like the coupled approach, and this logic would be enabled/disabled at different prune depths, and also may mean we may require different sync strategies at different depths, so has similar drawbacks to coupling. The main difference would be two parallel streams of data vs one. And this could end up being worth it for the sake of more efficient download but we haven't seen download being the bottleneck elsewhere in sync.

Generally what I think implementation would look like in Lighthouse -

Coupling:

  • prior to the blob prune depth it remains the same
  • after the prune depth, we use a new request that uses the same processing logic, but might use a different batch size (we make existing processing generic)

Decoupling:

  • prior to the blob prune depth it remains the same
  • after, we add a new request and figure out how to merge it with with the existing block sync
    • Either via caching, and figuring out re-processing and fault attribution logic (this would require the most tinkering in existing block sync, perhaps making a new implementation so as to not disturb what we have)
    • or coupling requests tightly, and potentially adjusting batch size

So either way, the new logic looks like an appendage that we can later remove, and we will have to keep existing sync - the second to me is a cleaner separation, so unless we find the gain from parallel download to be meaningful for coupling would still be my vote.

I'm curious what the Prysm implementation looks like currently? Maybe @terencechain or @Inphi know? And also interested to hear what other client teams think about what their implementation might look like

@realbigsean
Copy link
Contributor

Also I don't think prune depth edge cases will be a problem, so long as in implementation nodes serve blobs with some margin past the prune depth ( maybe a couple of epochs, maybe a couple of hours). And only request at the prune depth.

@AgeManning
Copy link
Contributor

As @realbigsean has said, I think for Lighthouse, the coupling approach would be by far the least engineering effort solution.

We could leave our sync strategies almost entirely untouched and generalise our processing of the blocks and blobs.

If we uniformly agree of the pruning depth amongst nodes and consider nodes that prune early malicious, our sync code could also find and score these peers appropriately. It would be harder to do in the uncoupled case.

We do lose some potential optimisation in parallel downloading of blobs, but our sync automatically load balances across peers and it is adjustable such that instead of downloading many blocks from one peer, we can download fewer blocks and blobs from each peer, in effect still parallelising the downloading, just in block + blob bundles. As Sean pointed out, currently the bottleneck is not the bandwidth usually, its processing. This may change with blobs, for lower bandwidth peers, but at the moment our network is quite idle in sync waiting for blocks to be processed.

This may not be the case for other client teams who may prefer the decoupled case. Just speaking specifically for lighthouse.

@tbenr
Copy link
Contributor

tbenr commented Nov 11, 2022

I agree that the simplest approach is to have the coupled version, which would lead to a simpler implementation. We don't need to have any concept of "optimistic" head wrt blobs. Any failure has a clear scoring attribution.

I still think that at the boundary, when we expect to start getting blobs, there is an edge case to handle. I'm thinking about being permissive when requesting blobs for the farmost epoch if we are close to the epoch transition that would allow that epoch to be pruned (this might turn out to not request blobs for that epoch at all, since by the time we reach the tip of the chain those blobs will be probably already prunable).

If we decide for the decoupled version, I think the first implementation would try to couple in the syncing process by asking a batch of blocks and blobs in parallel to the same peer, than make some simple consistency checks and then try to import the batch. If fails we consider it invalid downscoring the peer and will try again with another peer.

A more advanced approach would be to keep block sync as it is, let it import optimistically and then have a separate blobs sidecar fetcher that handle the download an kzg verification independently and mark nodes as valid. Verification failure handling and retries would be tricky here.

I'm curious to hear from Prysm too. Are they trying to couple early or do they implemented an optimistic approach?

@dgcoffman dgcoffman mentioned this issue Nov 14, 2022
25 tasks
@hwwhww hwwhww added the Deneb was called: eip-4844 label Nov 15, 2022
@terencechain
Copy link
Contributor

Apology for the delayed response. Prysm will aim to "couple" even if the protocol specification has them decoupled. Under decoupling, Prysm will need to figure out how to merge block and blob sync under one process, and this can be done either by advanced caching or greatly coupling the request next to each other. This is not ideal due to potential differences in batch sizes between block and blob requests and peering scoring attributes.

Decoupling is more complicated to implement within Prysm code base due to asynchrony and inconsistency. It's also unclear how much optimization there is by decoupling them and if we are falling down a premature optimization path

@dapplion
Copy link
Member

In Lodestar implementation sync and block processing is agnostic on dealing with either blocks or block+blobs. So the byRange call is coupled effectively during range sync. I won't mind coupling these methods but don't see the problem with leaving them un-coupled either. Implementations can choose to logically couple them but the reverse is not possible.

@arnetheduck
Copy link
Contributor

I'll leave the same comment as in #3139 (comment) - I believe that while it's simpler to implement coupling right now, this is a case of simplicity becoming simplistic instead - with growing data amounts, we will want to optimize data access and coupling introduces an artificial barrier.

For the long term, I believe uncoupled-everything has more merit, even if makes the initial implementation a bit more tricky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deneb was called: eip-4844
Projects
None yet
Development

No branches or pull requests

12 participants