-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EIP-4844: Remove BlobsSidecarsByRange Req-Resp #3087
Comments
We still need a way to historical sync or backtrack sync "missing" blobs with a weak subjectivity state. Are you implying that we add a |
We could make |
Ah I see. I misread the spec, It was my understanding that the |
We've separately been coming to a similar conclusion in lighthouse. I think we were initially planning to have a separate |
@realbigsean I agree. I think we should just have |
We definitely need some form of historical sync req-resp method. The |
The problem with having only A naive option would be to discard the |
I think coupling on historic sync adds more technical debt than keeping them together due to having to rework something that is tried and true (block sync) into something that looks very different than what we expect the future to hold Reasons being:
That said, the protocol technical debt is not huge. I suspect the debt to manifest more in the engineering so will step down if the general engineering consensus is counter to my view. |
I don't think it's possible to leave blob syncing untouched within the blob prune depth because block validation is now dependent on blobs, so we have to fit blobs into the picture somehow. With decoupling, we're introducing scenarios where a blob could show up before or after a blob, so you'd have to cache blocks and blobs and probably figure out a way to re-request either if either doesn't show up. And then you have to figure out how to attribute faults if any. More generally, decoupling introduces asynchrony (which could be good, but is just more complicated to engineer and changes block sync code more) and the possibility of inconsistency. We could address these two things in a decoupled approach by:
But this starts to look very much like the coupled approach, and this logic would be enabled/disabled at different prune depths, and also may mean we may require different sync strategies at different depths, so has similar drawbacks to coupling. The main difference would be two parallel streams of data vs one. And this could end up being worth it for the sake of more efficient download but we haven't seen download being the bottleneck elsewhere in sync. Generally what I think implementation would look like in Lighthouse - Coupling:
Decoupling:
So either way, the new logic looks like an appendage that we can later remove, and we will have to keep existing sync - the second to me is a cleaner separation, so unless we find the gain from parallel download to be meaningful for coupling would still be my vote. I'm curious what the Prysm implementation looks like currently? Maybe @terencechain or @Inphi know? And also interested to hear what other client teams think about what their implementation might look like |
Also I don't think prune depth edge cases will be a problem, so long as in implementation nodes serve blobs with some margin past the prune depth ( maybe a couple of epochs, maybe a couple of hours). And only request at the prune depth. |
As @realbigsean has said, I think for Lighthouse, the coupling approach would be by far the least engineering effort solution. We could leave our sync strategies almost entirely untouched and generalise our processing of the blocks and blobs. If we uniformly agree of the pruning depth amongst nodes and consider nodes that prune early malicious, our sync code could also find and score these peers appropriately. It would be harder to do in the uncoupled case. We do lose some potential optimisation in parallel downloading of blobs, but our sync automatically load balances across peers and it is adjustable such that instead of downloading many blocks from one peer, we can download fewer blocks and blobs from each peer, in effect still parallelising the downloading, just in block + blob bundles. As Sean pointed out, currently the bottleneck is not the bandwidth usually, its processing. This may change with blobs, for lower bandwidth peers, but at the moment our network is quite idle in sync waiting for blocks to be processed. This may not be the case for other client teams who may prefer the decoupled case. Just speaking specifically for lighthouse. |
I agree that the simplest approach is to have the coupled version, which would lead to a simpler implementation. We don't need to have any concept of "optimistic" head wrt blobs. Any failure has a clear scoring attribution. I still think that at the boundary, when we expect to start getting blobs, there is an edge case to handle. I'm thinking about being permissive when requesting blobs for the farmost epoch if we are close to the epoch transition that would allow that epoch to be pruned (this might turn out to not request blobs for that epoch at all, since by the time we reach the tip of the chain those blobs will be probably already prunable). If we decide for the decoupled version, I think the first implementation would try to couple in the syncing process by asking a batch of blocks and blobs in parallel to the same peer, than make some simple consistency checks and then try to import the batch. If fails we consider it invalid downscoring the peer and will try again with another peer. A more advanced approach would be to keep block sync as it is, let it import optimistically and then have a separate blobs sidecar fetcher that handle the download an kzg verification independently and mark nodes as valid. Verification failure handling and retries would be tricky here. I'm curious to hear from Prysm too. Are they trying to couple early or do they implemented an optimistic approach? |
Apology for the delayed response. Prysm will aim to "couple" even if the protocol specification has them decoupled. Under decoupling, Prysm will need to figure out how to merge block and blob sync under one process, and this can be done either by advanced caching or greatly coupling the request next to each other. This is not ideal due to potential differences in batch sizes between block and blob requests and peering scoring attributes. Decoupling is more complicated to implement within Prysm code base due to asynchrony and inconsistency. It's also unclear how much optimization there is by decoupling them and if we are falling down a premature optimization path |
In Lodestar implementation sync and block processing is agnostic on dealing with either blocks or block+blobs. So the byRange call is coupled effectively during range sync. I won't mind coupling these methods but don't see the problem with leaving them un-coupled either. Implementations can choose to logically couple them but the reverse is not possible.
|
I'll leave the same comment as in #3139 (comment) - I believe that while it's simpler to implement coupling right now, this is a case of simplicity becoming simplistic instead - with growing data amounts, we will want to optimize data access and coupling introduces an artificial barrier. For the long term, I believe uncoupled-everything has more merit, even if makes the initial implementation a bit more tricky. |
The original usecase for
BlobsSidecarByRange
RPC was to allow nodes sync blobs that missed the initial broadcast. We no longer need to do this with Coupled Beacon Blocks. Historical sync can continue to sync (eip-4844) blocks as it works today because they now contain blob sidecars.There is one other use for
BlobsSidecarByRange
which is so light clients can retrieve blobs sidecars only that are older thanMIN_EPOCHS_FOR_BLOBS_SIDECARS_REQUESTS
. This usecase is pretty niche though and such clients can always use the blocks RPC, even if that means potentially receiving dup blocks. Hence, I propose that we remove the RPC altogether and keep the spec minimal.@terencechain
The text was updated successfully, but these errors were encountered: