Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Remove checkpoint alignment requirements and enable historic state pruning #4610

Closed
wants to merge 18 commits into from

Conversation

michaelsproul
Copy link
Member

Issue Addressed

Closes #3210
Closes #3211

Proposed Changes

  • Checkpoint sync from the latest finalized state regardless of its alignment.
  • Add the block_root to the database's split point. This is only added to the in-memory split in order to avoid a schema migration. See load_split.
  • Add a new method to the DB called get_advanced_state, which looks up a state by block root, with a state_root as fallback. Using this method prevents accidental accesses of the split's unadvanced state, which does not exist in the hot DB and is not guaranteed to exist in the freezer DB at all. Previously Lighthouse would look up this state from the freezer DB, even if it was required for block/attestation processing, which was suboptimal.
  • Replace several state look-ups in block and attestation processing with get_advanced_state so that they can't hit the split block's unadvanced state.
  • Do not store any states in the freezer database by default. All states will be deleted upon being evicted from the hot database unless --reconstruct-historic-states is set. The anchor info which was previously used for checkpoint sync is used to implement this, including when syncing from genesis.

Additional Info

Needs further testing. I want to stress-test the pruned database under Hydra.

The get_advanced_state method is intended to become more relevant over time: tree-states includes an identically named method that returns advanced states from its in-memory cache.

michaelsproul and others added 10 commits July 17, 2023 14:48
Squashed commit of the following:

commit 4dd2130
Author: realbigsean <[email protected]>
Date:   Tue Jun 13 13:11:48 2023 -0400

    fix test

commit 36309d3
Author: realbigsean <[email protected]>
Date:   Tue Jun 13 12:26:27 2023 -0400

    query for block by `state.latest_block_header.slot`

commit 1ddd335
Author: realbigsean <[email protected]>
Date:   Mon Jun 12 13:09:49 2023 -0400

    get state first and query by most recent block root

commit 5fda708
Author: realbigsean <[email protected]>
Date:   Fri Jun 9 14:55:02 2023 -0400

    query for checkpoint state by slot rather than state root (teku doesn't serve by state root)

commit da25512
Author: realbigsean <[email protected]>
Date:   Fri Jun 9 13:00:42 2023 -0400

    add import

commit 28576c9
Author: realbigsean <[email protected]>
Date:   Fri Jun 9 12:03:59 2023 -0400

    checkpoint sync without alignment
@michaelsproul michaelsproul added enhancement New feature or request work-in-progress PR is a work-in-progress database backwards-incompat Backwards-incompatible API change labels Aug 11, 2023
@michaelsproul michaelsproul added ready-for-review The code is ready for review v4.4.1 ETA August 2023 and removed work-in-progress PR is a work-in-progress labels Aug 18, 2023
@michaelsproul
Copy link
Member Author

This is ready for review now. Due to the delicate nature of the split alignment change I've been running this change under Hydra to check that there are no ERROR logs reachable or other unexpected failures/crashes. So far so good after 18h of fuzzing.

The fuzzer is available on the hydra-split-alignment branch on my fork.

@paulhauner paulhauner self-requested a review August 18, 2023 00:59
Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, although I do find it hard to reason that we're not going to run into trouble by not having the finalized state in some circumstances.

I imagine testnets will help us identify issues.

Happy to merge, although I had one little comment/suggestion.

(P.S. I did this review entirely from VSCode on my local machine, hopefully there's nothing weird about it!)

///
/// Presently this is only used to avoid loading the un-advanced split state, but in future will
/// be expanded to return states from an in-memory cache.
pub fn get_advanced_state(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should make it clearer (perhaps in the function name) that this will only return hot state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, updated.

@michaelsproul michaelsproul removed the ready-for-review The code is ready for review label Aug 21, 2023
@michaelsproul michaelsproul added the ready-for-merge This PR is ready to merge. label Aug 21, 2023
@michaelsproul
Copy link
Member Author

bors r+

bors bot pushed a commit that referenced this pull request Aug 21, 2023
…uning (#4610)

## Issue Addressed

Closes #3210
Closes #3211

## Proposed Changes

- Checkpoint sync from the latest finalized state regardless of its alignment.
- Add the `block_root` to the database's split point. This is _only_ added to the in-memory split in order to avoid a schema migration. See `load_split`.
- Add a new method to the DB called `get_advanced_state`, which looks up a state _by block root_, with a `state_root` as fallback. Using this method prevents accidental accesses of the split's unadvanced state, which does not exist in the hot DB and is not guaranteed to exist in the freezer DB at all. Previously Lighthouse would look up this state _from the freezer DB_, even if it was required for block/attestation processing, which was suboptimal.
- Replace several state look-ups in block and attestation processing with `get_advanced_state` so that they can't hit the split block's unadvanced state.
- Do not store any states in the freezer database by default. All states will be deleted upon being evicted from the hot database unless `--reconstruct-historic-states` is set. The anchor info which was previously used for checkpoint sync is used to implement this, including when syncing from genesis.

## Additional Info

Needs further testing. I want to stress-test the pruned database under Hydra.

The `get_advanced_state` method is intended to become more relevant over time: `tree-states` includes an identically named method that returns advanced states from its in-memory cache.

Co-authored-by: realbigsean <[email protected]>
@bors
Copy link

bors bot commented Aug 21, 2023

Build failed (retrying...):

bors bot pushed a commit that referenced this pull request Aug 21, 2023
…uning (#4610)

## Issue Addressed

Closes #3210
Closes #3211

## Proposed Changes

- Checkpoint sync from the latest finalized state regardless of its alignment.
- Add the `block_root` to the database's split point. This is _only_ added to the in-memory split in order to avoid a schema migration. See `load_split`.
- Add a new method to the DB called `get_advanced_state`, which looks up a state _by block root_, with a `state_root` as fallback. Using this method prevents accidental accesses of the split's unadvanced state, which does not exist in the hot DB and is not guaranteed to exist in the freezer DB at all. Previously Lighthouse would look up this state _from the freezer DB_, even if it was required for block/attestation processing, which was suboptimal.
- Replace several state look-ups in block and attestation processing with `get_advanced_state` so that they can't hit the split block's unadvanced state.
- Do not store any states in the freezer database by default. All states will be deleted upon being evicted from the hot database unless `--reconstruct-historic-states` is set. The anchor info which was previously used for checkpoint sync is used to implement this, including when syncing from genesis.

## Additional Info

Needs further testing. I want to stress-test the pruned database under Hydra.

The `get_advanced_state` method is intended to become more relevant over time: `tree-states` includes an identically named method that returns advanced states from its in-memory cache.

Co-authored-by: realbigsean <[email protected]>
@bors
Copy link

bors bot commented Aug 21, 2023

Build failed:

@michaelsproul
Copy link
Member Author

bors retry

bors bot pushed a commit that referenced this pull request Aug 21, 2023
…uning (#4610)

## Issue Addressed

Closes #3210
Closes #3211

## Proposed Changes

- Checkpoint sync from the latest finalized state regardless of its alignment.
- Add the `block_root` to the database's split point. This is _only_ added to the in-memory split in order to avoid a schema migration. See `load_split`.
- Add a new method to the DB called `get_advanced_state`, which looks up a state _by block root_, with a `state_root` as fallback. Using this method prevents accidental accesses of the split's unadvanced state, which does not exist in the hot DB and is not guaranteed to exist in the freezer DB at all. Previously Lighthouse would look up this state _from the freezer DB_, even if it was required for block/attestation processing, which was suboptimal.
- Replace several state look-ups in block and attestation processing with `get_advanced_state` so that they can't hit the split block's unadvanced state.
- Do not store any states in the freezer database by default. All states will be deleted upon being evicted from the hot database unless `--reconstruct-historic-states` is set. The anchor info which was previously used for checkpoint sync is used to implement this, including when syncing from genesis.

## Additional Info

Needs further testing. I want to stress-test the pruned database under Hydra.

The `get_advanced_state` method is intended to become more relevant over time: `tree-states` includes an identically named method that returns advanced states from its in-memory cache.

Co-authored-by: realbigsean <[email protected]>
@bors
Copy link

bors bot commented Aug 21, 2023

@bors bors bot changed the title Remove checkpoint alignment requirements and enable historic state pruning [Merged by Bors] - Remove checkpoint alignment requirements and enable historic state pruning Aug 21, 2023
@bors bors bot closed this Aug 21, 2023
@michaelsproul michaelsproul deleted the split-alignment branch August 21, 2023 07:40
bors bot pushed a commit that referenced this pull request Aug 28, 2023
## Issue Addressed

Fix a bug in the storage of the linear block roots array in the freezer DB. Previously this array was always written as part of state storage (or block backfill). With state pruning enabled by #4610, these states were no longer being written and as a result neither were the block roots.

The impact is quite low, we would just log an error when trying to forwards-iterate the block roots, which for validating nodes only happens when they try to look up blocks for peers:

> Aug 25 03:42:36.980 ERRO Missing chunk in forwards iterator      chunk index: 49726, service: freezer_db

Any node checkpoint synced off `unstable` is affected and has a corrupt database. If you see the log above, you need to re-sync with the fix. Nodes that haven't checkpoint synced recently should _not_ be corrupted, even if they ran the buggy version.

## Proposed Changes

- Use a `ChunkWriter` to write the block roots when states are not being stored.
- Tweak the usage of `get_latest_restore_point` so that it doesn't return a nonsense value when state pruning is enabled.
- Tweak the guarantee on the block roots array so that block roots are assumed available up to the split slot (exclusive). This is a bit nicer than relying on anything to do with the latest restore point, which is a nonsensical concept when there aren't any restore points.

## Additional Info

I'm looking forward to deleting the chunked vector code for good when we merge tree-states :grin:
jxs pushed a commit to jxs/lighthouse that referenced this pull request Aug 28, 2023
## Issue Addressed

Fix a bug in the storage of the linear block roots array in the freezer DB. Previously this array was always written as part of state storage (or block backfill). With state pruning enabled by sigp#4610, these states were no longer being written and as a result neither were the block roots.

The impact is quite low, we would just log an error when trying to forwards-iterate the block roots, which for validating nodes only happens when they try to look up blocks for peers:

> Aug 25 03:42:36.980 ERRO Missing chunk in forwards iterator      chunk index: 49726, service: freezer_db

Any node checkpoint synced off `unstable` is affected and has a corrupt database. If you see the log above, you need to re-sync with the fix. Nodes that haven't checkpoint synced recently should _not_ be corrupted, even if they ran the buggy version.

## Proposed Changes

- Use a `ChunkWriter` to write the block roots when states are not being stored.
- Tweak the usage of `get_latest_restore_point` so that it doesn't return a nonsense value when state pruning is enabled.
- Tweak the guarantee on the block roots array so that block roots are assumed available up to the split slot (exclusive). This is a bit nicer than relying on anything to do with the latest restore point, which is a nonsensical concept when there aren't any restore points.

## Additional Info

I'm looking forward to deleting the chunked vector code for good when we merge tree-states :grin:
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
…uning (sigp#4610)

Closes sigp#3210
Closes sigp#3211

- Checkpoint sync from the latest finalized state regardless of its alignment.
- Add the `block_root` to the database's split point. This is _only_ added to the in-memory split in order to avoid a schema migration. See `load_split`.
- Add a new method to the DB called `get_advanced_state`, which looks up a state _by block root_, with a `state_root` as fallback. Using this method prevents accidental accesses of the split's unadvanced state, which does not exist in the hot DB and is not guaranteed to exist in the freezer DB at all. Previously Lighthouse would look up this state _from the freezer DB_, even if it was required for block/attestation processing, which was suboptimal.
- Replace several state look-ups in block and attestation processing with `get_advanced_state` so that they can't hit the split block's unadvanced state.
- Do not store any states in the freezer database by default. All states will be deleted upon being evicted from the hot database unless `--reconstruct-historic-states` is set. The anchor info which was previously used for checkpoint sync is used to implement this, including when syncing from genesis.

Needs further testing. I want to stress-test the pruned database under Hydra.

The `get_advanced_state` method is intended to become more relevant over time: `tree-states` includes an identically named method that returns advanced states from its in-memory cache.

Co-authored-by: realbigsean <[email protected]>
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
Fix a bug in the storage of the linear block roots array in the freezer DB. Previously this array was always written as part of state storage (or block backfill). With state pruning enabled by sigp#4610, these states were no longer being written and as a result neither were the block roots.

The impact is quite low, we would just log an error when trying to forwards-iterate the block roots, which for validating nodes only happens when they try to look up blocks for peers:

> Aug 25 03:42:36.980 ERRO Missing chunk in forwards iterator      chunk index: 49726, service: freezer_db

Any node checkpoint synced off `unstable` is affected and has a corrupt database. If you see the log above, you need to re-sync with the fix. Nodes that haven't checkpoint synced recently should _not_ be corrupted, even if they ran the buggy version.

- Use a `ChunkWriter` to write the block roots when states are not being stored.
- Tweak the usage of `get_latest_restore_point` so that it doesn't return a nonsense value when state pruning is enabled.
- Tweak the guarantee on the block roots array so that block roots are assumed available up to the split slot (exclusive). This is a bit nicer than relying on anything to do with the latest restore point, which is a nonsensical concept when there aren't any restore points.

I'm looking forward to deleting the chunked vector code for good when we merge tree-states :grin:
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
…uning (sigp#4610)

Closes sigp#3210
Closes sigp#3211

- Checkpoint sync from the latest finalized state regardless of its alignment.
- Add the `block_root` to the database's split point. This is _only_ added to the in-memory split in order to avoid a schema migration. See `load_split`.
- Add a new method to the DB called `get_advanced_state`, which looks up a state _by block root_, with a `state_root` as fallback. Using this method prevents accidental accesses of the split's unadvanced state, which does not exist in the hot DB and is not guaranteed to exist in the freezer DB at all. Previously Lighthouse would look up this state _from the freezer DB_, even if it was required for block/attestation processing, which was suboptimal.
- Replace several state look-ups in block and attestation processing with `get_advanced_state` so that they can't hit the split block's unadvanced state.
- Do not store any states in the freezer database by default. All states will be deleted upon being evicted from the hot database unless `--reconstruct-historic-states` is set. The anchor info which was previously used for checkpoint sync is used to implement this, including when syncing from genesis.

Needs further testing. I want to stress-test the pruned database under Hydra.

The `get_advanced_state` method is intended to become more relevant over time: `tree-states` includes an identically named method that returns advanced states from its in-memory cache.

Co-authored-by: realbigsean <[email protected]>
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
Fix a bug in the storage of the linear block roots array in the freezer DB. Previously this array was always written as part of state storage (or block backfill). With state pruning enabled by sigp#4610, these states were no longer being written and as a result neither were the block roots.

The impact is quite low, we would just log an error when trying to forwards-iterate the block roots, which for validating nodes only happens when they try to look up blocks for peers:

> Aug 25 03:42:36.980 ERRO Missing chunk in forwards iterator      chunk index: 49726, service: freezer_db

Any node checkpoint synced off `unstable` is affected and has a corrupt database. If you see the log above, you need to re-sync with the fix. Nodes that haven't checkpoint synced recently should _not_ be corrupted, even if they ran the buggy version.

- Use a `ChunkWriter` to write the block roots when states are not being stored.
- Tweak the usage of `get_latest_restore_point` so that it doesn't return a nonsense value when state pruning is enabled.
- Tweak the guarantee on the block roots array so that block roots are assumed available up to the split slot (exclusive). This is a bit nicer than relying on anything to do with the latest restore point, which is a nonsensical concept when there aren't any restore points.

I'm looking forward to deleting the chunked vector code for good when we merge tree-states :grin:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards-incompat Backwards-incompatible API change database enhancement New feature or request ready-for-merge This PR is ready to merge. v4.4.1 ETA August 2023
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants