Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lighthouse sends genesis block twice in libp2p BlocksByRange response #4943

Closed
etan-status opened this issue Nov 21, 2023 · 3 comments
Closed
Assignees
Labels
bug Something isn't working database

Comments

@etan-status
Copy link

Description

When requesting blocks 0 ..< 32 on Holesky from Nimbus, Lighthouse seems to respond with a duplicate Genesis block.

Version

libp2p identify message: agent_version=Lighthouse/v4.5.0-441fc16/x86_64-linux

Present Behaviour

Lighthouse responds with slots [0, 0, 2, 4, 6, 9, 11, 13, 14, 15, 16, 18, 20, 21, 22, 23, 24, 26, 27, 28, 30]

Expected Behaviour

Lighthouse only includes slot 0 once in the response.

Steps to resolve

https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/p2p-interface.md#beaconblocksbyrange

Clients MUST respond with blocks that are consistent from a single chain within the context of the request. This applies to any step value. In particular when step == 1, each parent_root MUST match the hash_tree_root of the preceding block.

@jimmygchen jimmygchen self-assigned this Nov 22, 2023
@jimmygchen
Copy link
Member

@etan-status Thanks for raising this! I'll look into this.

@jimmygchen jimmygchen added the bug Something isn't working label Nov 22, 2023
@michaelsproul
Copy link
Member

I think I know what's happening here.

We had a database bug (#4817) which caused the block root at slot 1 not to be stored on Holesky in the case where the node checkpoint synced and backfilled. We fixed that bug in #4820 in order to fix state reconstruction, however we didn't realise at the time that it would manifest as incorrect BlocksByRange responses. The reason for the incorrect responses is that we store block roots in chunks of 128 at a time. These chunks are default initialised to 0x0, so the block root for slot 1 on buggy Holesky nodes is 0x0. Our database also knows how to resolve the 0x0 root to the genesis block, so in a BlocksByRange we:

  • slot 0: look up the actual genesis block root and get the genesis block
  • slot 1: look up the 0x0 genesis block alias and get the genesis block

The 0x0 isn't caught by the de-duplication that we apply because it's not equal to the actual genesis block root.

I think an appropriate fix would be to run a little function at startup on v18 Holesky databases to store the correct block root at slot 1. We didn't implement this initially because we thought the corruption was only relevant to archive nodes, and they were failing loudly.

TL;DR: some disgusting database junk that only happened on Holesky, requires a patch to fix

@michaelsproul
Copy link
Member

Fixed in #4985. This will be in Lighthouse v4.6.0. We'll encourage users to update so it fixes the network-wide behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working database
Projects
None yet
Development

No branches or pull requests

3 participants