-
Notifications
You must be signed in to change notification settings - Fork 56
RFC: ReverseSync - fetching historical data #224
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Is there an alternative name for this feature that can be used? "Filling block" suggests to me filling it with txs, in which case backfilling doesn't make sense. Perhaps |
@ValarDragon I think the initial coining of the term was to mean filling the gap between genesis and the nodes current base with blocks although I can also see how one might think it may mean going back and filling prior blocks with more txs. I'm not attached to the naming so I don't mind renaming this as: "fetch historical blockdata" if everyone else is in accordance If we want something a little shorter and in alignment with the x-sync nomenclature then we could also call it "reversesync" |
I think reversesync is a great name as well |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
2605476
to
a140f48
Compare
rfc/004-reverse-sync.md
Outdated
parameters as: | ||
|
||
```go | ||
max_historical_height = max(state.InitialHeight, state.LastBlockHeight - state.ConsensusParams.EvidenceAgeHeight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be min
? I'm not sure what state.InitialHeight
is, but what if that height is past the minimum bound for processing evidence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the max just means we won't try to verify below the initial height (as no heights below initial height exist). It's merely a technicality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also find naming confusing here. I guess you are trying to identify max block such that both block height and block time satisfy evidence age parameters as you have written below. It might be easier to understand if you move property from line 69 up front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so I initially wasn't sure with which perspective to apply this. It might be intuitive to think of the minimum amount of headers that a node needs. But as this measure is in terms of height, I tended towards the highest height that nodes can prune to. I can change the nomenclature if we can think of a more intuitive way to describe it
rfc/004-reverse-sync.md
Outdated
|
||
In addition, Tendermint currently sends messages for entire blocks only. This | ||
would be inconvenient if we only wanted to acquire the header (although there | ||
may be benefits in saving the entire block). Hence we may want to consider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I think this is a valid point, but would reside in the blockchain reactor, no? If so, how do we tell the blockchain reactor to just get headers only during reverse syncing. Seems like it'll add to the complexity.
Perhaps we just keep the logic as-is and get the entire block, even though we only need the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can decide this at a later date. Personally I feel splitting out the components of the block as individual messages makes sense for a few reasons:
- Headers and Commits represent a fraction of the size of an entire block - we don't want to be sending large amount of data/load across the network that isn't actually used.
- Division of the block aligns with the structure of the blockstore which stores commits, headers and the rest of the block in different compartments.
- It aligns with the abstraction between verification and execution (at least in a delayed execution model) where headers and commits are used predominantly for verification and the txs in a block are used for execution
- It aligns with the use case of state sync and light clients. If we want to integrate these components into the p2p layer, where these components only require headers and commits and not the entire block, then it makes sense that we divide these messages.
rfc/004-reverse-sync.md
Outdated
The aforementioned data is already available via the following RPC endpoints: | ||
`/blockchain` or `/commit` for `Header`'s' and `/validators` for | ||
`ValidatorSet`'s'. It may be the best option to fetch and verify these resources | ||
over RPC. Statesync already requires a list of RPC endpoints, so these could be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should really consider Erik's light client reactor (P2P option below) - we already have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah if we went with extending the p2p layer then I think Erik's approach would be the strongest candidate when it comes to the actual implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there documentation on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's this closed PR tendermint/tendermint#4508
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's about all the documentation we have on that
For the protocols I recently worked on, for
having a complete record (up to the trusting period in the past) of the headers is sufficient. Are there cases where we need the complete blocks? In case we only need the headers, I agree that conceptually the problem is similar to backwards verification in the lightclient. More generally, I can imagine that several design decisions in the past are based on the assumption that full nodes have a complete history. Did someone do a systematic review of the protocols where such an implicit assumption may be used? |
rfc/004-reverse-sync.md
Outdated
introduced some challenges of its own which were covered and subsequently | ||
tackled with [RFC-001](https://github.com/tendermint/spec/blob/master/rfc/001-block-retention.md). | ||
The RFC allowed applications to set a block retention height; an upper bound on | ||
what blocks would be pruned. However nodes who state sync past this upper bound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand what you want to say here. Can you be a bit more precise about the motivating use case?
rfc/004-reverse-sync.md
Outdated
tackled with [RFC-001](https://github.com/tendermint/spec/blob/master/rfc/001-block-retention.md). | ||
The RFC allowed applications to set a block retention height; an upper bound on | ||
what blocks would be pruned. However nodes who state sync past this upper bound | ||
(which is necessary as snapshots must be saved within the trusting period for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct saying that retention height and unbonding period are not correlated at all currently, i.e., we don't ensure that honest full nodes keep blocks during the unbonding period?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the crux of the problem
rfc/004-reverse-sync.md
Outdated
|
||
For now, we focus purely on the case of a state syncing node, whom after | ||
syncing to a height will need to verify historical data in order to be capable | ||
of processing new blocks. We can denote the earliest height that the node will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this. Why you need to verify historical data (what do you mean by historical data here?) to be able to process new blocks? Do you mean here that before participating in consensus to order new blocks, you need to ensure you have enough historical data to be able to verify evidence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup.
By historical data I mean Header
and ValidatorSet
. I explain this later on but for now I was intentionally trying to keep this high level to start with. I can add exactly what data is needed at the start if you think this is a better structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned that somewhere else. In general we should do a closer review about where in the current protocol we assume full nodes to have all past blocks. For instance, in Fastsync, I can also ask peers for blocks. If they don't have historical data they cannot respond. They might be removed by the syncing node as peer, although they are correct. I suspect this implicit assumption might be around in several places.
rfc/004-reverse-sync.md
Outdated
`A.Height <= max_historical_height` and `A.Time <= max_historical_time`. | ||
|
||
Upon successfully reverse syncing, a node can now safely continue. As this | ||
feature is only used as part of state sync, one can think of this as merely an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ensure that this property always hold, i.e., that retentionHeight does not lead to removing blocks needed to verify evidence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we don't, but this RFC would introduce one.
rfc/004-reverse-sync.md
Outdated
extension to it. | ||
|
||
In the future we may want to extend this functionality to allow nodes to fetch | ||
historical blocks for reasons of accountability or data accessibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fork accountability (if this is what you mean by accountability) operates in the same security model, so the same lower bound for needed block applies also in that case. What do you mean by data accessibility here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the original motivations for ReverseSync was to take a node that had truncated history and to be able to turn it into an archive node. This is a node that has all the blocks from genesis height.
When pruning and state sync was introduced, a chief concern was that it incentivises nodes to keep only what they need. The most alarming result to this is that if every node truncated their history then all data below that height would be lost forever. New nodes would not be able to fast sync from genesis and no transactional records during that era would exist. Accountability and data accessibility refer to this problem - that the network should have a healthy amount of data available.
rfc/004-reverse-sync.md
Outdated
|
||
ReverseSync is used to fetch and verify the following data structures: | ||
- `Header` | ||
- `ValidatorSet` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure to understand this. Given a trusted block B, you can use backward verification to verify previous blocks by following hash links in the past. Why you need to check valsets explicitly? Is it because of evidence handling which does not require access to a complete block? However, you still need more than just valset for evidence handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the val sets has nothing to do with the backwards verification algorithm itself, it's more because we want validator sets as a necessary data structure when verifying evidence. Thus when a node sends us a validator set we need to check that we can trust it.
An earlier revision of this RFC proposed to retrieve the complete block in order to give nodes the option to go from having a truncated history (i.e. when starting from state sync) to having full history. This idea was abandoned and I simplified the RFC to look purely at the problem where a node that has state synced doesn't have all the headers and validator sets within the unbonding period. We need both headers and validator sets. Headers are used for verification and are used as the trusted header in the event of
When pruning and starting at an initial height was introduced we tried as far as the implementation went to sweep the entire code for assumptions that were now invalid. I believe we got most of them but I still see some cases where the assumption that a node has full block history is made. |
rfc/004-reverse-sync.md
Outdated
for fast sync that also extends to ReverseSync is termination. ReverseSync will | ||
finish it's task when one of the following conditions have been met: | ||
|
||
1. It reaches a block `A` where `A.Height <= max_historical_height` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we use for as an upper bound of the unbonding period window, now or bft time of the state synced state? In the former case, we have a moving target for termination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bft time of the state synced state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work!I believe there are still few open points that should be clarified/agreed upon before we can merge this. More particularly, it would be good if there is more clarity what exactly we want to achieve with reversesync (my understanding is supporting evidence handling is the main use case), and how we want to do it can be left for ADR.
I just did another pass through of the evidence logic. It seems that we will also need the
This is the only use case at the moment. Prior we thought about adding it as a tool to turn a node with truncated history to become an archive node with full history but we decided to simplify the proposal. (See #224 (comment)). Given that we narrowed the scope of this RFC, then perhaps we don't need to call it something like reverse sync as it will just be an additional component of state sync |
I have polished up this RFC to include some of the discussions held at the dev call a few weeks back where we decided to use the p2p layer and thus introduce two new messages. I will look to merge this by the end of the day in case anyone wants to look over it one last time |
This RFC addresses the need for Tendermint nodes to be able to fetch, verify and persist historical blocks and state data
Rendered