Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tree): disable cached trie updates for chains with >1 block #7753

Merged
merged 3 commits into from
Apr 21, 2024

Conversation

rkrasiuk
Copy link
Member

Description

Closes #7559.

Disabled trie updates caching for chains longer than 1 block.

@rkrasiuk rkrasiuk added C-bug An unexpected or incorrect behavior A-engine Related to the engine implementation A-trie Related to Merkle Patricia Trie implementation labels Apr 19, 2024
@github-actions github-actions bot added C-tracking-issue An issue that collects information about a broad development initiative M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity and removed A-engine Related to the engine implementation labels Apr 19, 2024
@rkrasiuk rkrasiuk added A-engine Related to the engine implementation and removed C-tracking-issue An issue that collects information about a broad development initiative labels Apr 19, 2024
@onbjerg onbjerg removed the M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity label Apr 19, 2024
@onbjerg
Copy link
Collaborator

onbjerg commented Apr 19, 2024

does it also close all issues linked in the tracking issue?

@rkrasiuk
Copy link
Member Author

rkrasiuk commented Apr 19, 2024

@onbjerg potentially yes, was too lazy to link. @Rjected we need to go back to them and validate that users experienced long chain commits

@winksaville
Copy link
Contributor

I fetched this disable-trie-updates-for-long-chains for this PR and added my panic change resulting and a created disable-updates-for-long-chains-plus-panic-on-InsertBlockError on my fork.

I built and installed the code and I'm running on my holesky node, Ethel:

kendall@ethel 24-04-20T03:07:29.666Z:~
$ ~/bin/reth.7753.disable-updates-for-long-chains-plus-panic-on-InsertBlockError --version
reth Version: 0.2.0-beta.5-dev
Commit SHA: 559b5cf3e
Build Timestamp: 2024-04-19T20:01:05.395319306Z
Build Features: jemalloc
Build Profile: release
kendall@ethel 24-04-20T06:08:53.073Z:~

Here is the service file that runs the code:

kendall@ethel 24-04-20T06:08:53.073Z:~
$ cat /etc/systemd/system/reth.7753.disable-updates-for-long-chains-plus-panic-on-InsertBlockError.service 
# The ethereum service (part of systemd)
# file: /etc/systemd/system/eth1.service

[Unit]
Description     = reth eth2 service
Requires        = network-online.target
After           = network-online.target

[Service]
User            = kendall
ExecStartPre    = /bin/sleep 2
Environment     = "RUST_BACKTRACE=1"
ExecStart       = \
  /home/kendall/bin/reth.7753.disable-updates-for-long-chains-plus-panic-on-InsertBlockError node \
  --full \
  --chain holesky \
  --instance 1 \
  --max-outbound-peers 16 \
  --max-inbound-peers 16 \
  --http \
  --http.addr 0.0.0.0 \
  --ws \
  --http.api admin,eth,web3,net \
  --datadir /home/kendall/eth2-data/reth/holesky \
  --authrpc.addr 127.0.0.1 \
  --authrpc.port 8551 \
  --authrpc.jwtsecret /home/kendall/eth2-data/secrets/jwtsecret \
  --log.file.format terminal \
  --log.file.filter debug \
  --log.file.directory /home/kendall/eth2-data/reth \
  --log.file.max-size 200 \
  --log.file.max-files 5 \
  --log.journald \
  --log.journald.filter error \
  --color auto \
  --metrics 0.0.0.0:9001 \

Restart         = no
RestartSec      = 3
TimeoutSec      = 300

[Install]
WantedBy    = multi-user.target
kendall@ethel 24-04-20T06:10:48.968Z:~

My first step was to stop the EL and CL's on Ethel, I compiled the code on Ethel and copied it from ~/.cargo/bin to
~/bin. I also copied the "current datadir" so that if there was a problem (there was) we'd have the "original". I then started reth, becon-chain and lighthouse and something wasn't quite right I wasn't seeing the journactl logs, I quickly figured out I didn't have my terminal setup to properly stream the logs from
reth.7753.disable-updates-for-long-chains-plus-panic-on-InsertBlockError. I fixed my terminal and the restarted
beacon-chain and validator.

The problem is, beacon-chain was never able to successfully attest to any blocks in 5hrs of running. So I stopped
reth, beacon-chain and validator. I've updated two files the datadir and a sha256sum file. The datadir contains the
db as well as all of the "debug" logs.

As I said I do have a copy of the "original" database with logs but I don't have time to upload it now,
I can provide it when I wake up in the morning if it's needed. Maybe the problem was my stopping CLs
so quickly after I started (I don't believe I stopped reth but it's not impossible.

@winksaville
Copy link
Contributor

I fetched this disable-trie-updates-for-long-chains for this PR and added my panic change resulting and a created disable-updates-for-long-chains-plus-panic-on-InsertBlockError on my fork.

I've updated my branch to the latest changes here. I reverted my datadir back to what it was before trying this the first time. Now my node is syncing again, TXS @rkrasiuk !

Apr 20 09:54:21 ethel lighthouse[65015]: Apr 20 16:54:21.481 INFO Previous epoch attestation(s) missing   validators: ["1624364", "1624363"], epoch: 46169, service: val_mon, service: beacon
Apr 20 10:00:45 ethel lighthouse[65015]: Apr 20 17:00:45.475 INFO Previous epoch attestation(s) success   validators: ["1624364", "1624363"], epoch: 46170, service: val_mon, service: beacon
Apr 20 10:07:15 ethel lighthouse[65015]: Apr 20 17:07:15.458 INFO Previous epoch attestation(s) success   validators: ["1624364", "1624363"], epoch: 46170, service: val_mon, service: beacon

@rkrasiuk rkrasiuk added this pull request to the merge queue Apr 21, 2024
Merged via the queue into main with commit 223dde2 Apr 21, 2024
27 checks passed
@rkrasiuk rkrasiuk deleted the rkrasiuk/disable-trie-updates-for-long-chains branch April 21, 2024 09:33
rkrasiuk added a commit that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-engine Related to the engine implementation A-trie Related to Merkle Patricia Trie implementation C-bug An unexpected or incorrect behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracking: State root mismatch
5 participants