Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: State root mismatch #7559

Closed
12 tasks done
Rjected opened this issue Apr 10, 2024 · 5 comments · Fixed by #7753
Closed
12 tasks done

Tracking: State root mismatch #7559

Rjected opened this issue Apr 10, 2024 · 5 comments · Fixed by #7753
Labels
A-trie Related to Merkle Patricia Trie implementation C-bug An unexpected or incorrect behavior C-tracking-issue An issue that collects information about a broad development initiative M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity

Comments

@Rjected
Copy link
Member

Rjected commented Apr 10, 2024

Here I'm aggregating all issues related to the merkle root mismatches that have been seen on nodes recently. We're investigating and would like to triage all the different environments in which this has occurred, as well as when this has occurred for users.

The issues, in order of when they were posted:

Issues

Preview Give feedback
  1. A-staged-sync A-trie C-bug
    onbjerg
  2. A-staged-sync A-trie C-bug S-stale
  3. A-trie C-bug
  4. A-trie C-bug
  5. A-trie C-bug
  6. A-trie C-bug
  7. A-trie C-bug
  8. A-trie C-bug
  9. A-trie C-bug
  10. A-trie C-bug
  11. A-trie C-bug S-stale
  12. A-trie C-bug
@Rjected Rjected added C-bug An unexpected or incorrect behavior C-tracking-issue An issue that collects information about a broad development initiative A-trie Related to Merkle Patricia Trie implementation labels Apr 10, 2024
@shekhirin shekhirin changed the title State root mismatch tracking thread Tracking: State root mismatch Apr 12, 2024
@shekhirin shekhirin pinned this issue Apr 12, 2024
@ajsutton
Copy link

I hit a couple of cases where reth incorrectly rejected blocks on sepolia. Logs for the first case was with beta.5:
first-bad-block.txt

After updating to a local build of 0.2.0-beta.5-dev (041e29347), it sync'd past that block fine and then hit a second block. This one persisted across restarts:
bad-block-after-restart.txt

db stats:
reth-checksum.txt
stats.txt

The node is currently resyncing after dropping the merkle stage so hopefully it will get back to tracking the chain correctly again.

@pistomat
Copy link
Contributor

pistomat commented Apr 13, 2024

Another one on Holesky #7619

@DaniPopes DaniPopes added the M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity label Apr 15, 2024
@ChrisTorresLugo
Copy link

I ran into this issue twice on mainnet at different block heights using the most recent beta release. The first time around, I dropped the DB and resynced. The resynced node tracked the tip for a few days and crashed last night with the same error. Happy to provide logs or any other info that could help triaging and fixing this issue.

@lgaroche
Copy link

lgaroche commented Apr 18, 2024

Also ran into the error. It was just finishing a mainnet resync from scratch (to try and solve the same issue).

CL client: Nimbus beacon node v24.3.0-dc19b0-stateofus

reth Version: 0.2.0-beta.4
Commit SHA: c04dbe6e9
Build Timestamp: 2024-04-03T09:42:35.297053503Z
Build Features: jemalloc
Build Profile: maxperf

Reth config: reth.txt
Reth debug logs: debug.log
Reth DB stats: stats.txt
Reth static-files get: 19680870.json
Reth static-files get: 19680679.json

First, there's an error:

2024-04-18T14:37:04.385454Z ERROR sync::pipeline: Stage encountered a validation error: mismatched block state root: got 0x276e23ed2e3afb46ec505768e5c83b8d562ba8364df0a15d2d21d161c5b316a8, expected 0x33a256aa3cbf5c5e0fb9cf66d6b8787b636d1cfa341355a32c6d3ad13966727f stage=MerkleExecute bad_block=19680870

Then it starts unwinding the Merkle stage, and it stops after encountering another error:

Error: stage encountered an error in block #19680679: validation error: mismatched block state root: got 0xd8ba0da43cd4de825401e6abf39b450c71b18b6b8655f206f2d24cf8a4aae978, expected 0xc9110dc6774d5067206141ea44b5d64abd9467939001cf25489660cc4815a968

Caused by:
   0: validation error: mismatched block state root: got 0xd8ba0da43cd4de825401e6abf39b450c71b18b6b8655f206f2d24cf8a4aae978, expected 0xc9110dc6774d5067206141ea44b5d64abd9467939001cf25489660cc4815a968
   1: mismatched block state root: got 0xd8ba0da43cd4de825401e6abf39b450c71b18b6b8655f206f2d24cf8a4aae978, expected 0xc9110dc6774d5067206141ea44b5d64abd9467939001cf25489660cc4815a968

Location:
    /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/task/poll.rs:255:39

@jsvisa
Copy link
Contributor

jsvisa commented Apr 22, 2024

Solved the issue temporarily by rewinding the block header to a previously reliable header before the reported error block. This solution can be applied to anyone encountering a similar problem.

In my case, the error log from reth looks as below:

2024-04-22T08:25:31.853884Z ERROR shutting down due to error
Error: stage encountered an error in block #19589998: validation error: mismatched block state root: got 0x3fe60a526f9f8408dac7d9484a52b54bc730cbf40bb946b61551cb0fbca848ff, expected 0x7b7a0926f706b4878d7974a064ace50b879c225dd5dbdf0bb9d7eee4d288ad06

Caused by:
   0: validation error: mismatched block state root: got 0x3fe60a526f9f8408dac7d9484a52b54bc730cbf40bb946b61551cb0fbca848ff, expected 0x7b7a0926f706b4878d7974a064ace50b879c225dd5dbdf0bb9d7eee4d288ad06
   1: mismatched block state root: got 0x3fe60a526f9f8408dac7d9484a52b54bc730cbf40bb946b61551cb0fbca848ff, expected 0x7b7a0926f706b4878d7974a064ace50b879c225dd5dbdf0bb9d7eee4d288ad06

It reported the block 19589998 is invalid, so I run rewind with the command as below:

First I try to rewind to the block just before, eg 19589990

/path/to/reth stage rewind --datadir data --to-block 19589000

but it failed, the reason is similar to the below(mismatched block state root). Try to rewind a littler older

/path/to/reth stage rewind --datadir data --to-block 19589000

After a while, the rewind process succeed, and then restart reth, reth can resync from 19589000 again.

@onbjerg onbjerg unpinned this issue Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-trie Related to Merkle Patricia Trie implementation C-bug An unexpected or incorrect behavior C-tracking-issue An issue that collects information about a broad development initiative M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

7 participants