-
Notifications
You must be signed in to change notification settings - Fork 997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider old finalized data available? #3141
Conversation
Wouldn't More like:
Would this work? |
The issue with this is that you would have to trust what your peers tell you for the finalized epoch. So you can use this during sync as a best-guess, but if you get to the head and realize your peers were lying to you, now you need to reprocess the segment of the chain you thought was finalized but now know was not actually finalized. Because these blocks now have an additional validity condition (is data available). |
This is the same trust assumption as today, no? We are trusting peers to tell us the finalized epoch and will support us with valid blocks till that. So the only difference is we are trusting peers that will support us with valid blocks and blobs |
Well today we're able to tell if a block a peer gives us is valid regardless of what they tell us the finalized epoch is, right?. The difference now, is that if |
@realbigsean & I were talking about this proposal earlier and we thought there might be additional considerations that need to be discussed. BACKGROUND
fn is_block_valid(block) {
if block.is_finalized() {
# finalized blocks DO NOT have any data availability considerations
return block.is_valid_under_pre_4844_rules()
} else {
# unfinalized blocks DO have data availability considerations
return block.is_valid_under_pre_4844_rules() && block.blob_is_available_and_valid()
}
} THE PROBLEM Newly syncing nodes will only need to verify the pre-4844 block conditions until they come to an unfinalized block. At that point, they will need to also verify the blob validity conditions. But these nodes have no way of knowing whether or not a block is finalized until they import it (and other blocks built on top of it presumably). This is where my understanding gets a bit fuzzy. I recall @paulhauner saying something about fork-choice poisoning attacks in the
Nodes that are synced to the head of the chain are not susceptible to this attack because they will always enforce the blob validity conditions, but newly syncing nodes are. This seems very analogous to the problem in the original optimistic sync spec where we didn't want nodes to optimistically import the transition block until we were sure it was justified, again something we couldn't know during before importing it. In the merge transition block situation we used
But in |
Why do we need to provide a guarantee that un-finalized blobs older than My understanding is that we (will) set |
If you have to import a new chain not yet seen before, how do you know that those block's blobs where ever available at any point? To provide full node safety guarantees you must ensure yourself that blobs have been available at some point. Or that's the assumption I'm working with here. Else you may deadlock L2's into a bad unavailable chain? If the assumption above is correct, then you must download those blobs from p2p to prove to yourself data is available. So the network must retain and serve those blocks to allow any peer to converge on that chain. |
With this PR, we use the assumption that if >2/3rds of validators have attested to a chain then all payloads must have been available at some point. Without this PR, we use the assumption that the canonical chain does not contain a consecutive streak of
In both scenarios we are never actually verifying that the entirety of the chain had available blobs. Rather, we're making assumptions about the availability of old blobs based on the behavior of other validators. I agree that with this PR we're operating under more reliable assumptions than without. However, this PR does bring with it theoretically unbounded blob storage and mandated optimistic sync. Ultimately, the question here is whether or not L2s are comfortable under the assumption that there will always be a better chain to out-compete a malicious chain of So, in summary, I think the trade-off here is whether L2s would rather (a) wait for everyone to implement blob-optimistic sync or (b) live with the malicious chain assumption. (We must also consider whether or not opt sync and unbounded blob storage are safe/feasible for the protocol, but it would be useful to know what 4844 users are expecting.) |
As an implementer I want optimistic sync as far away from me as possible. @protolambda can you comment on L2 needs? |
I agree with this generally. The worry would be if there is some sort of sustained fork, and you have to re-org to a different but unavailable blobs chain. That said, for there to be a chain of such depth, there has to be validators (and likely users) on that chain and thus the data would have been "made available" and us unlikely to have disappeared. I think the failure mode is
The thing about 18 day time horizons of a full partition (chain A is hidden from chain B) is two-fold. (1) On this order, the non-hidden chain (depending on the split) is in the realm of finalizing and (2) 18 days is on the order of "we can fix any problem manually in this length of time" My gut is that due to these two considerations, that having the pruning window simply be 18 days rather than the greater of (18_days, time_since_latest_finalized) provides essentially the same guarantee we expect/require As for the consideration here to just consider anything past finalized as available, this strictly puts more power in the hands of a malicious majority validator set by putting a much much tighter bound on the online assumptions for full nodes. In the event that a node is offline for > 2 epochs and a malicious validator set finalizes unavailable data, the node would be able to be tricked into following an unavailable chain. Similarly this truncates the onlinedness requirement to any sort of L2 policing node or node otherwise trying to get the data due to the p2p also truncating the serving. The DA pruning period being on the order of the (desired) WS period length, the leak-to-majority period, and the "we can fix anything in this time frame" period, and the max optimistic roll-up fraud proof period (planned today) ensures that we don't introduce tighter onlinedness requirements to fully verify the chain than we have today. I'd be a strong no on making this as tight as latest-finalized |
Just want to note, I wasn't actually suggesting that we consider anything finalized available. I should've been more explicit about this, but I was operating under the assumption that the forkchoice poisoning issues I brought up would only occur in the context of a chain that hadn't finalized within the DA pruning period. In such a case, we are already enforcing the blobs are valid and available for the entire DA pruning period. Hence why I said:
|
Isn't the most realistic failure mode 51% of nodes are lazily validating blobs? In this case the chain is split by any proposer who withholds a blob, and a lazy validator has no incentive to rejoin the correct chain, they just wait 18 days and their chain becomes correct. Tying blob availability checks to finalization would give lazy validators incentive to join the correct chain. |
Validators can also lazily validate the chain, does blob weaken the assumption that much? As for the optimistic rollup, I can speak for Arbitrum. What matters most is the blob retention period is greater than the fraud-proof challenge period. The fraud-proof challenge period is designed to be long enough so L2 validators can participate in challenges under the censorship threat model. The challenge period is 7 days and the blobs retention period is 18 days. I think we'll be fine. The ultimate question should we be taking |
If block validity changes after 18 days (which it would by having the |
because users also don't follow unavailable chains so if a lazy validator is on an unavailable chain because they were lazy, users/explorers/infra/exchanges/etc that do proper DA checks won't be on such a chain. That is Ethereum (and the community it supports) would not be in this false reality. Which is quite the incentive for a lazy validator to get back onto the actually available chain -- before social intervention or the available chain finalizing due to inactivity leak EDIT: To be clear, places where a validator can be lazy are security issues and can/should be patched. If an attacker % plus lazy % is greater than 50% that's definitely bad -- even worse at 2/3. But fully validating nodes cannot be tricked and would not follow such chains. So the direct incentive to not be lazy is not there, but the second order incentive that you will not be on the actual users' chain is there. Proof-of-custody and proof-of-execution are both very important security upgrades to prioritize in the next few years. |
An alternative here is to prune at the 18 day depth and to not consider blocks past that depth as available (unless you previously validated it yourself). This would avoid an automatic chain re-org in the attack scenarios we discussed, and instead would require social intervention if you really wanted to jump back. Such a path
This essentially puts us in a defacto local finalization at the prune depth. This is something we kind of accept today with the quadratic leak. One thing to consider is that if we kept this as the logic, greatly reducing the prune depth would change the "Defacto finality period" which I wouldn't be comfortable making much larger than the quadratic leak to majority period. |
@djrtwo suggestion is sensible with the current 18 day depth. Can we accept the approach as sufficient for now, and revisit the topic once there's a need / interest to shorten the depth? Essentially kicking the can to the future with the goal of not complicating eip-4844 early 2023 version |
I am echoing @djrtwo. In the edge case when the latest finalized checkpoint is more than 18 days ago, one can use a light version of social consensus to bootstrap a node, i.e. ask his/her friends, or EF, or whomever he/she trusts to get a state from within the 18 days period and bootstrap a node with it. Assuming that a trusted party have observed no DA issues for that [now - x_days; now - 18_days] period. All CL clients do currently support bootstrapping with an arbitrary state. |
Yes so the relationship with the quadratic inactivity leak does make me generally much more on board with just having As far as whether to consider unfinalized blocks older than |
I want to echo that a semi-major UX degregation here is that if you are offline for the prune window period, you wouldn't be able to sync to head anymore without manual intervention due to being behind where you can evaluate DA from the p2p network. It also has implications if a node wants to sync from genesis without providing a recent finalized root. This isn't a crazy departure from the security model -- at those depths you are at risk to long range attacks without bringing in a recent piece of data from the network out of band, but currently, I would imagine most or all clients would still sync to the head (and usually be fine). So it doesn't change the security model but does change the practical UX if DA at such depths is enforced strictly. There are maybe spectrums of enforcement to balance the UX -- e.g. don't reorg to a chain you can't check DA of but you're allowed to extend the chain you already know of until you get into the DA window. |
This sounds like exactly how it should work. Users should not sync from genesis without a recent root of trust, and users should not have their nodes blindly follow the validators after being offline for an extended period of time. I think if we want to address this, we should do so it a way that doesn't compromise individual node operator security via things like encouraging trusted root source lists that can automatically be compared against in these situations and the system will fail (until user intervenes) if they ever disagree. |
Were there any more thoughts on anchoring the retention period in finality instead? the only downside was that a few more weeks of data must be kept in the case of non-finality, ie up to 3. |
Technically non-finality can occur until all ETH has been burned. We certainly hope that the real world worst case finalization failure gets resolved after the inactivity leak. I don't have a strong argument against attaching data availability to finality, but I think that it should be made clear and be well understood that we are changing the upper bound on disk utilization from a hard limit to a soft/economic limit. |
I also think this solution is the best.
The downside to this is that it means that a finalized chain can re-org an unfinalized chain, even if it has unavailable data. Consider the following scenario:
|
closing in favor of #3169 |
This issue is closed with conclusion that finalization should be ignored when considering blobs. ethereum#3141
When a node range syncs it can't know the head state's finalized checkpoint. When performing range sync, two scenarios:
Network is finalizing
According to current p2p spec, blobs older than MIN_EPOCHS_FOR_BLOBS_SIDECARS_REQUESTS are pruned thus un-available via p2p. The node must consider
data_is_available == true
, to sync to the head and not deadlock.Network is not finalizing for longer than MIN_EPOCHS_FOR_BLOBS_SIDECARS_REQUESTS
A full node can't guarantee than a block's blobsSidecar was available for un-finalized blocks. It must request the blobsSidecar from the network, which it can since it's available according to current p2p spec. However this conflicts with the above scenario.
Optimistic sync again?
A node may consider its peer's status finalized checkpoint as correct. Then range sync till that point and, request blobs since are un-finalized. If a peer responds with no blobs for that epoch: the node can't differentiate between the peer withholding data (dishonest) or that epoch being finalized (honest).
To be sure that a blob at
epoch < clock_epoch - MIN_EPOCHS_FOR_BLOBS_SIDECARS_REQUESTS
is available the node has check that no epoch has been finalized up until current chain tip. To assert that condition, all blocks up to the head must be processed and imported. Whoever the head is unsafe until all un-finalized blobsSidecar are imported.The node could optimistically import all blocks until a past epoch is finalized, then mark those blocks as
is_data_available == true
if older than MIN_EPOCHS_FOR_BLOBS_SIDECARS_REQUESTS. Else attempt to request blobsSidecar for each block.Suggested sync sequence:
To sum-up If 3. true, the head is unsafe until all blobs are proven to be available.
Side-point, for nodes to be able to efficiently request that range, should they use by_root requests, or extend the serving range for by_range requests?