Skip to content
This repository has been archived by the owner on Feb 23, 2022. It is now read-only.

Commit

Permalink
simplify to focus on the case of state sync
Browse files Browse the repository at this point in the history
  • Loading branch information
cmwaters committed Mar 3, 2021
1 parent 6bf966b commit a140f48
Showing 1 changed file with 106 additions and 126 deletions.
232 changes: 106 additions & 126 deletions rfc/004-reverse-sync.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## Changelog

- 2021-03-03: Simplify proposal to the state sync case.
- 2021-02-17: Add notes on asynchronicity of processes.
- 2020-12-10: Rename backfill blocks to reverse sync.
- 2020-11-25: Initial draft.
Expand All @@ -26,109 +27,61 @@ then eventually switch to consensus may not have the block and validator
history to verify evidence causing them to panic if they see 2/3 commit on what
the node believes to be an invalid block.

Furthermore, having the capability to backfill prior blocks also allows nodes with
truncated history to become archive nodes which may be beneficial for the
network.
Thus, this RFC sets out to instil a minimum block history invariant amongst
honest nodes.

## Proposal

A backfill mechanism can simply be defined as an algorithm for fetching,
verifying and storing, blocks and other state data (`ValidatorSet`'s
`ConsensusParams`'s and `ABCIResponses`'s) of a height prior to the current
base of the node's blockchain. In matching the terminology used for other
data retrieving protocols (i.e. fast sync and state sync), we
call this method **ReverseSync**. Verification and design will be
addressed later in the RFC. First we cover where the mechanism would be deployed.
verifying and storing, headers and validator sets of a height prior to the
current base of the node's blockchain. In matching the terminology used for
other data retrieving protocols (i.e. fast sync and state sync), we
call this method **ReverseSync**.

We will define the mechanism in four sections:
- Usage
- Verification
- Design
- Termination

### Usage

For now, we focus purely on the case of a state syncing node, whom after
syncing to a height will need to verify an adequate amount of previous data in
order to be capable of processing new blocks and advancing sate. We can call
this the `max_historical_height`/`time` (max as in earliest/highest possible
time/height). Both height and time are necessary as this maps to the BFT time
used for evidence expiration. After acquiring `State`, we calculate these
parameters as:

There are two places where a backfill mechanism would be required:
1. Upon startup via state sync where nodes must meet the prerequisite history
before participating in consensus
2. Where application specifies that the `retain_height` needs to be lower than
the current blockchain base.

### ReverseSync after State Sync

A node, seeking to state sync, will find an adequate snapshot and offer it to
the application with the following format:

```proto
message RequestOfferSnapshot {
Snapshot snapshot = 1; // snapshot offered by peers
bytes app_hash = 2; // light client-verified app hash for snapshot height
}
```go
max_historical_height = max(state.InitialHeight, state.LastBlockHeight - state.ConsensusParams.EvidenceAgeHeight)
max_historical_time = max(GenesisTime, state.LastBlockTime.Sub(state.ConsensusParams.EvidenceAgeTime))
```

where `Snapshot` contains the new field `backfill_height` :
Before starting either fast sync or consensus, we then run the following
synchronous process:

```proto
message Snapshot {
uint64 height = 1; // The height at which the snapshot was taken
uint32 format = 2; // The application-specific snapshot format
uint32 chunks = 3; // Number of chunks in the snapshot
bytes hash = 4; // Arbitrary snapshot hash, equal only if identical
bytes metadata = 5; // Arbitrary application metadata
uint64 backfill_height = 6; // Height to backfill blocks from before starting application (inclusive)
}
```go
func ReverseSync(
max_historical_height int64
max_historical_time time.Time
) error
```

If all chunks are accepted and state sync is successful then the node will
retrieve and verify blocks up to (and including) the specified backfill height
**before** participating in consensus.

### ReverseSync on application request

The application has control of block retention via `retain_height`, called here:

```proto
message ResponseCommit {
// reserve 1
bytes data = 2;
uint64 retain_height = 3;
}
```
Where we fetch and verify blocks until a block `A` where
`A.Height <= max_historical_height` and `A.Time <= max_historical_time`.

Beforehand, a retain height that was less than the nodes current base was
ignored. With the same backfill mechanism, the application should now be able to
lower the retention height below the current base height (whether to accommodate
changing consensus parameters, app specific needs, or a local config).
This accommodates the use case of a truncated node that wishes to become an
archive node.

#### Choosing between Strong or Weak Gaurantees about Data

It can be said previously, that block pruning, with respect to the application,
offered strong gaurantees about data i.e. if the application pruned to height 100,
the operation was synchronous and thus the application knew it would prune to 100
before the next height. This meant that the application always knew the height and
base of the blockchain at every step. The opposite, weak gaurantees, implies that
the application can't gaurantee that what it asked from Tendermint, with respect
to data, has happened.

The downside of strong gaurantees is that it is blocking. If a validator decided
to prune a million blocks at once, the validator could potentially fall behind.
There's an argument then that pruning (and for that matter reverse sync) shouldn't
lie on the consensus critical path but rather be an asynchronous background process
of the node.

Further to this argument, nodes will not be able to prune past the evidence age
regardless of the retain height that the application requests. Finally, applications
may ask to fetch blocks back to a height that none of the nodes peers have.
The mechanism must ensure termination even if it doesn't reach the height requested
by the app.

Taking all these points into consideration, ReverseSync and pruning should be
operations that run asynchronously in the background. Application can be informed
of the current base via `RequestEndBlock`:

```proto
message RequestEndBlock {
Height uint64
Base uint64
}
```
Upon successfully reverse syncing a node can now safely continue. As this
feature is only used as part of state sync, one can think of this as merely an
extension to it.

### Verification

ReverseSync is used to fetch and verify the following data structures:
- `Header`
- `ValidatorSet`

Nodes will need to verify these prior blocks. This can be achieved by first
retrieving the header at the base height from the block store. The node then
checks that the `LastBlockID` corresponds with the hash calculated from the
Expand All @@ -138,46 +91,85 @@ header in the new block directly below:
header[height].LastBlockID == hash(header[height-1])
```

The node can therefore trust the new header. This then can be followed with
using the hashes in the trusted header to validate the other block and state
info. This is recursively done till the backfill height.
The node can therefore trust the new header. It then fetches the validator set
and validates it as such:

```go
header[height-1].ValidatorHash == hash(validatorSet[height-1])
```

### Design

This section will provide a high level picture of the design with the specifics being
addressed in a following ADR. The reverse mechanism requires the gossiping of two new
messages to allow the passsing of state data:
This section will provide a high level overview of some of the more important
characteristics of the design, saving the more tedious details as an ADR. This
section is divided into two approaches that can be used to achieve the defined
backfill mechanism based on the two networking protocols: RPC or P2P.

### RPC

The aforementioned data is already available via the following RPC endpoints:
`/blockchain` or `/commit` for `Header`'s' and `/validators` for
`ValidatorSet`'s'. It may be the best option to fetch and verify these resources
over RPC. Statesync already requires a list of RPC endpoints, so these could be
reused.

### P2P

A more congruent approach to the current infrastructure would be to request and
send the respective resources over the p2p protocol. This would mean the
addition of two new messages.

```proto
message StateDataRequest {
message ValidatorSetRequest {
uint64 height = 1;
}
```

```proto
message StateDataResponse {
message ValidatorSetResponse {
uint64 height = 1;
ValidatorSet validator_set = 2;
ConsensusParams consensus_params = 3;
ABCIResponses abci_responses = 4;
}
```

This would be sent across a new channel most likely within the blockchain reactor
(one can view ReverseSync as a similar process to FastSync and thus should most
likey be in the same domain).
In addition, Tendermint currently sends messages for entire blocks only. This
would be inconvenient if we only wanted to acquire the header (although there
may be benefits in saving the entire block). Hence we may want to consider
increasing the granularity of the request of resources by separating out
components such as header, commit and data (includes evidence).

The P2P path may also enable P2P networked light clients and a state sync that
also doesn't need to rely on RPC as the adequate data can now be transmitted
through P2P.

## Termination

ReverseSync draws a lot of parallels with fast sync. An important consideration
for fast sync that also extends to ReverseSync is termination. ReverseSync will
finish it's task when one of the following conditions have been met:

1. It reaches a block `A` where `A.Height <= max_historical_height` and
`A.Time <= max_historical_time`.
2. None of it's peers reports to have the block at the height below the
processes current block.
3. A global timeout.

This implies that we can't guarantee adequate history and thus the term
"invariant" can't be used in the strictest sense. In the case that the first
condition isn't met, the node will optimistically attempt to continue with
either fast sync or consensus.

## Alternative Solutions

If we do not want to extend the blockchain's functionality, it is possible to
use the embedded light client within state sync instead to verify backwards outside
of the unbondind period and request a snapshot at that height instead.
The need for a minimum block history invariant stems purely from the need to
validate evidence (although there may be some application relevant needs as
well). Because of this, an alternative, could be to simply trust whatever the
2/3+ majority has agreed upon and in the case where a node is at the head of the
blockchain, you simply abstain from voting.

Taking one step further back, some of the problems listed in the introduction
could be resolved by other means. Full nodes (and even validators to a certain
extent) could simply bypass validation if they didn't have the necessary
history trusting purely in consensus. Node operators running nodes with truncated
history could start up a separate full node if they wanted complete history.
As it stands, if 2/3+ vote on evidence you can't verify, in the same manner if
2/3+ vote on a header that a node sees as invalid (perhaps due to a different
app hash), the node will halt.

## Status

Expand All @@ -187,32 +179,20 @@ Proposed

### Positive

- Greater flexibility over the data that a node has. This makes it easier to
become a node with full history.
- Ensures a minimum block history invariant.
- Ensures a minimum block history invariant for honest nodes.

### Negative

- Applications need to be careful about adjusting the retain height too
frequently as this will put extra load on the network.
- Statesync will be slower as more processing is required.
- Implementation needs to be aware of multiple concurrent processes affecting
both the blockchain base and height

### Neutral

- By having validator sets served through p2p, this would make it easier to
extend p2p support to light clients.
- `Backfill_height` can be relative to the latest height of the providing node,
not just an absolute height. This means that an older snapshot might not require
as much backfilling as a newer one.
extend p2p support to light clients and state sync.
- In the future, it may also be possible to extend this feature to allow for
nodes to freely fetch and verify prior blocks

## References

- [RFC-001: Block retention](https://github.com/tendermint/spec/blob/master/rfc/001-block-retention.md)
- [Original issue](https://github.com/tendermint/tendermint/issues/4629)

## Notes

- In the future, if a decision is made to bring the light client into the p2p framework, then this state
data will likely be split up as the light client requires only the validator set for verification.

0 comments on commit a140f48

Please sign in to comment.