Skip to content
This repository has been archived by the owner on Feb 23, 2022. It is now read-only.

RFC: ReverseSync - fetching historical data #224

Merged
merged 16 commits into from
Apr 19, 2021
Merged

Conversation

cmwaters
Copy link
Contributor

@cmwaters cmwaters commented Nov 27, 2020

This RFC addresses the need for Tendermint nodes to be able to fetch, verify and persist historical blocks and state data

Rendered

@cmwaters cmwaters marked this pull request as ready for review November 30, 2020 13:36
rfc/006-backfill-blocks.md Outdated Show resolved Hide resolved
rfc/006-backfill-blocks.md Outdated Show resolved Hide resolved
rfc/006-backfill-blocks.md Outdated Show resolved Hide resolved
rfc/006-backfill-blocks.md Outdated Show resolved Hide resolved
rfc/006-backfill-blocks.md Outdated Show resolved Hide resolved
rfc/006-backfill-blocks.md Outdated Show resolved Hide resolved
@cmwaters cmwaters self-assigned this Dec 3, 2020
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

rfc/006-backfill-blocks.md Outdated Show resolved Hide resolved
rfc/006-backfill-blocks.md Outdated Show resolved Hide resolved
rfc/006-backfill-blocks.md Outdated Show resolved Hide resolved
rfc/006-backfill-blocks.md Outdated Show resolved Hide resolved
rfc/006-backfill-blocks.md Outdated Show resolved Hide resolved
rfc/006-backfill-blocks.md Outdated Show resolved Hide resolved
@ValarDragon
Copy link
Contributor

Is there an alternative name for this feature that can be used? "Filling block" suggests to me filling it with txs, in which case backfilling doesn't make sense.

Perhaps fetch historical blockdata?

@cmwaters
Copy link
Contributor Author

cmwaters commented Dec 9, 2020

@ValarDragon I think the initial coining of the term was to mean filling the gap between genesis and the nodes current base with blocks although I can also see how one might think it may mean going back and filling prior blocks with more txs. I'm not attached to the naming so I don't mind renaming this as: "fetch historical blockdata" if everyone else is in accordance

If we want something a little shorter and in alignment with the x-sync nomenclature then we could also call it "reversesync"

@ValarDragon
Copy link
Contributor

I think reversesync is a great name as well

@github-actions
Copy link

github-actions bot commented Jan 9, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale label Jan 9, 2021
@tessr tessr removed the Stale label Jan 9, 2021
@github-actions
Copy link

github-actions bot commented Feb 9, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale label Feb 9, 2021
@cmwaters cmwaters removed the Stale label Feb 9, 2021
@cmwaters cmwaters marked this pull request as draft February 17, 2021 11:22
rfc/004-reverse-sync.md Outdated Show resolved Hide resolved
rfc/004-reverse-sync.md Outdated Show resolved Hide resolved
rfc/004-reverse-sync.md Outdated Show resolved Hide resolved
rfc/004-reverse-sync.md Outdated Show resolved Hide resolved
rfc/004-reverse-sync.md Outdated Show resolved Hide resolved
parameters as:

```go
max_historical_height = max(state.InitialHeight, state.LastBlockHeight - state.ConsensusParams.EvidenceAgeHeight)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be min? I'm not sure what state.InitialHeight is, but what if that height is past the minimum bound for processing evidence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the max just means we won't try to verify below the initial height (as no heights below initial height exist). It's merely a technicality

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also find naming confusing here. I guess you are trying to identify max block such that both block height and block time satisfy evidence age parameters as you have written below. It might be easier to understand if you move property from line 69 up front.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so I initially wasn't sure with which perspective to apply this. It might be intuitive to think of the minimum amount of headers that a node needs. But as this measure is in terms of height, I tended towards the highest height that nodes can prune to. I can change the nomenclature if we can think of a more intuitive way to describe it

rfc/004-reverse-sync.md Outdated Show resolved Hide resolved
rfc/004-reverse-sync.md Outdated Show resolved Hide resolved

In addition, Tendermint currently sends messages for entire blocks only. This
would be inconvenient if we only wanted to acquire the header (although there
may be benefits in saving the entire block). Hence we may want to consider
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I think this is a valid point, but would reside in the blockchain reactor, no? If so, how do we tell the blockchain reactor to just get headers only during reverse syncing. Seems like it'll add to the complexity.

Perhaps we just keep the logic as-is and get the entire block, even though we only need the header.

Copy link
Contributor Author

@cmwaters cmwaters Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can decide this at a later date. Personally I feel splitting out the components of the block as individual messages makes sense for a few reasons:

  • Headers and Commits represent a fraction of the size of an entire block - we don't want to be sending large amount of data/load across the network that isn't actually used.
  • Division of the block aligns with the structure of the blockstore which stores commits, headers and the rest of the block in different compartments.
  • It aligns with the abstraction between verification and execution (at least in a delayed execution model) where headers and commits are used predominantly for verification and the txs in a block are used for execution
  • It aligns with the use case of state sync and light clients. If we want to integrate these components into the p2p layer, where these components only require headers and commits and not the entire block, then it makes sense that we divide these messages.

The aforementioned data is already available via the following RPC endpoints:
`/blockchain` or `/commit` for `Header`'s' and `/validators` for
`ValidatorSet`'s'. It may be the best option to fetch and verify these resources
over RPC. Statesync already requires a list of RPC endpoints, so these could be
Copy link
Contributor

@melekes melekes Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should really consider Erik's light client reactor (P2P option below) - we already have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if we went with extending the p2p layer then I think Erik's approach would be the strongest candidate when it comes to the actual implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there documentation on that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's this closed PR tendermint/tendermint#4508

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's about all the documentation we have on that

@josef-widder
Copy link
Contributor

For the protocols I recently worked on, for

  • evidence handling
  • handling lightnode requests for headers

having a complete record (up to the trusting period in the past) of the headers is sufficient. Are there cases where we need the complete blocks?

In case we only need the headers, I agree that conceptually the problem is similar to backwards verification in the lightclient.

More generally, I can imagine that several design decisions in the past are based on the assumption that full nodes have a complete history. Did someone do a systematic review of the protocols where such an implicit assumption may be used?

introduced some challenges of its own which were covered and subsequently
tackled with [RFC-001](https://github.com/tendermint/spec/blob/master/rfc/001-block-retention.md).
The RFC allowed applications to set a block retention height; an upper bound on
what blocks would be pruned. However nodes who state sync past this upper bound
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand what you want to say here. Can you be a bit more precise about the motivating use case?

tackled with [RFC-001](https://github.com/tendermint/spec/blob/master/rfc/001-block-retention.md).
The RFC allowed applications to set a block retention height; an upper bound on
what blocks would be pruned. However nodes who state sync past this upper bound
(which is necessary as snapshots must be saved within the trusting period for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct saying that retention height and unbonding period are not correlated at all currently, i.e., we don't ensure that honest full nodes keep blocks during the unbonding period?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the crux of the problem


For now, we focus purely on the case of a state syncing node, whom after
syncing to a height will need to verify historical data in order to be capable
of processing new blocks. We can denote the earliest height that the node will
Copy link
Contributor

@milosevic milosevic Mar 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this. Why you need to verify historical data (what do you mean by historical data here?) to be able to process new blocks? Do you mean here that before participating in consensus to order new blocks, you need to ensure you have enough historical data to be able to verify evidence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup.

By historical data I mean Header and ValidatorSet. I explain this later on but for now I was intentionally trying to keep this high level to start with. I can add exactly what data is needed at the start if you think this is a better structure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned that somewhere else. In general we should do a closer review about where in the current protocol we assume full nodes to have all past blocks. For instance, in Fastsync, I can also ask peers for blocks. If they don't have historical data they cannot respond. They might be removed by the syncing node as peer, although they are correct. I suspect this implicit assumption might be around in several places.

`A.Height <= max_historical_height` and `A.Time <= max_historical_time`.

Upon successfully reverse syncing, a node can now safely continue. As this
feature is only used as part of state sync, one can think of this as merely an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ensure that this property always hold, i.e., that retentionHeight does not lead to removing blocks needed to verify evidence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we don't, but this RFC would introduce one.

extension to it.

In the future we may want to extend this functionality to allow nodes to fetch
historical blocks for reasons of accountability or data accessibility.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fork accountability (if this is what you mean by accountability) operates in the same security model, so the same lower bound for needed block applies also in that case. What do you mean by data accessibility here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the original motivations for ReverseSync was to take a node that had truncated history and to be able to turn it into an archive node. This is a node that has all the blocks from genesis height.

When pruning and state sync was introduced, a chief concern was that it incentivises nodes to keep only what they need. The most alarming result to this is that if every node truncated their history then all data below that height would be lost forever. New nodes would not be able to fast sync from genesis and no transactional records during that era would exist. Accountability and data accessibility refer to this problem - that the network should have a healthy amount of data available.


ReverseSync is used to fetch and verify the following data structures:
- `Header`
- `ValidatorSet`
Copy link
Contributor

@milosevic milosevic Mar 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to understand this. Given a trusted block B, you can use backward verification to verify previous blocks by following hash links in the past. Why you need to check valsets explicitly? Is it because of evidence handling which does not require access to a complete block? However, you still need more than just valset for evidence handling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the val sets has nothing to do with the backwards verification algorithm itself, it's more because we want validator sets as a necessary data structure when verifying evidence. Thus when a node sends us a validator set we need to check that we can trust it.

@cmwaters
Copy link
Contributor Author

having a complete record (up to the trusting period in the past) of the headers is sufficient. Are there cases where we need the complete blocks?

An earlier revision of this RFC proposed to retrieve the complete block in order to give nodes the option to go from having a truncated history (i.e. when starting from state sync) to having full history. This idea was abandoned and I simplified the RFC to look purely at the problem where a node that has state synced doesn't have all the headers and validator sets within the unbonding period.

We need both headers and validator sets. Headers are used for verification and are used as the trusted header in the event of LightClientAttackEvidence and validator sets are used to confirm that the malicious validator is still bonded (within the evidence time).

More generally, I can imagine that several design decisions in the past are based on the assumption that full nodes have a complete history. Did someone do a systematic review of the protocols where such an implicit assumption may be used?

When pruning and starting at an initial height was introduced we tried as far as the implementation went to sweep the entire code for assumptions that were now invalid. I believe we got most of them but I still see some cases where the assumption that a node has full block history is made.

for fast sync that also extends to ReverseSync is termination. ReverseSync will
finish it's task when one of the following conditions have been met:

1. It reaches a block `A` where `A.Height <= max_historical_height` and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we use for as an upper bound of the unbonding period window, now or bft time of the state synced state? In the former case, we have a moving target for termination.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bft time of the state synced state

Copy link
Contributor

@milosevic milosevic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!I believe there are still few open points that should be clarified/agreed upon before we can merge this. More particularly, it would be good if there is more clarity what exactly we want to achieve with reversesync (my understanding is supporting evidence handling is the main use case), and how we want to do it can be left for ADR.

@cmwaters
Copy link
Contributor Author

I just did another pass through of the evidence logic. It seems that we will also need the Commit. This means that reverse sync needs to be able to fetch and verify:

  • Header
  • Commit
  • Validator Set

it would be good if there is more clarity what exactly we want to achieve with reversesync (my understanding is supporting evidence handling is the main use case)

This is the only use case at the moment.

Prior we thought about adding it as a tool to turn a node with truncated history to become an archive node with full history but we decided to simplify the proposal. (See #224 (comment)).

Given that we narrowed the scope of this RFC, then perhaps we don't need to call it something like reverse sync as it will just be an additional component of state sync

@cmwaters
Copy link
Contributor Author

I have polished up this RFC to include some of the discussions held at the dev call a few weeks back where we decided to use the p2p layer and thus introduce two new messages.

I will look to merge this by the end of the day in case anyone wants to look over it one last time

@cmwaters cmwaters merged commit b39af91 into master Apr 19, 2021
@cmwaters cmwaters deleted the callum/backfill-rfc branch April 19, 2021 15:02
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants