Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast syncing on a digitalocean droplet with a block storage volume results in a rocksdb exception #591

Closed
e-nikolov opened this issue Mar 25, 2020 · 8 comments
Labels
bug Something isn't working P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc))

Comments

@e-nikolov
Copy link

When Fast syncing the ethereum mainnet using a digitalocean block storage volume for storing the data, at some point I always start repeatedly getting this error:

2020-03-25 22:49:22.359+00:00 | EthScheduler-Services-1 (importBlock) | ERROR | PipelineChainDownloader | Chain download failed. Restarting after short delay.
java.util.concurrent.CompletionException: org.hyperledger.besu.plugin.services.exception.StorageException: org.rocksdb.RocksDBException: block checksum mismatch: expected 826185092, got 2532181494  in /opt/besu/data/database/030185.sst offset 63492071 size 7574
        at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?]
        at org.hyperledger.besu.services.pipeline.Pipeline.abort(Pipeline.java:152) ~[besu-pipeline-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:134) ~[besu-pipeline-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: org.hyperledger.besu.plugin.services.exception.StorageException: org.rocksdb.RocksDBException: block checksum mismatch: expected 826185092, got 2532181494  in /opt/besu/data/database/030185.sst offset 63492071 size 7574
        at org.hyperledger.besu.plugin.services.storage.rocksdb.segmented.RocksDBColumnarKeyValueStorage$RocksDbTransaction.commit(RocksDBColumnarKeyValueStorage.java:272) ~[besu-plugin-rocksdb-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.services.kvstore.SegmentedKeyValueStorageTransactionTransitionValidatorDecorator.commit(SegmentedKeyValueStorageTransactionTransitionValidatorDecorator.java:49) ~[besu-kvstore-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.services.kvstore.SegmentedKeyValueStorageAdapter$1.commit(SegmentedKeyValueStorageAdapter.java:85) ~[besu-kvstore-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.ethereum.storage.keyvalue.KeyValueStoragePrefixedKeyBlockchainStorage$Updater.commit(KeyValueStoragePrefixedKeyBlockchainStorage.java:189) ~[besu-core-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.ethereum.chain.DefaultBlockchain.appendBlockHelper(DefaultBlockchain.java:250) ~[besu-core-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.ethereum.chain.DefaultBlockchain.appendBlock(DefaultBlockchain.java:229) ~[besu-core-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.ethereum.mainnet.MainnetBlockImporter.fastImportBlock(MainnetBlockImporter.java:73) ~[besu-core-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.ethereum.eth.sync.fastsync.FastImportBlocksStep.importBlock(FastImportBlocksStep.java:66) ~[besu-eth-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.ethereum.eth.sync.fastsync.FastImportBlocksStep.accept(FastImportBlocksStep.java:51) ~[besu-eth-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.ethereum.eth.sync.fastsync.FastImportBlocksStep.accept(FastImportBlocksStep.java:30) ~[besu-eth-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.services.pipeline.CompleterStage.run(CompleterStage.java:37) ~[besu-pipeline-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:130) ~[besu-pipeline-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        ... 5 more
Caused by: org.rocksdb.RocksDBException: block checksum mismatch: expected 826185092, got 2532181494  in /opt/besu/data/database/030185.sst offset 63492071 size 7574
        at org.rocksdb.Transaction.commit(Native Method) ~[rocksdbjni-6.4.6.jar:?]
        at org.rocksdb.Transaction.commit(Transaction.java:206) ~[rocksdbjni-6.4.6.jar:?]
        at org.hyperledger.besu.plugin.services.storage.rocksdb.segmented.RocksDBColumnarKeyValueStorage$RocksDbTransaction.commit(RocksDBColumnarKeyValueStorage.java:270) ~[besu-plugin-rocksdb-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.services.kvstore.SegmentedKeyValueStorageTransactionTransitionValidatorDecorator.commit(SegmentedKeyValueStorageTransactionTransitionValidatorDecorator.java:49) ~[besu-kvstore-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.services.kvstore.SegmentedKeyValueStorageAdapter$1.commit(SegmentedKeyValueStorageAdapter.java:85) ~[besu-kvstore-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.ethereum.storage.keyvalue.KeyValueStoragePrefixedKeyBlockchainStorage$Updater.commit(KeyValueStoragePrefixedKeyBlockchainStorage.java:189) ~[besu-core-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.ethereum.chain.DefaultBlockchain.appendBlockHelper(DefaultBlockchain.java:250) ~[besu-core-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.ethereum.chain.DefaultBlockchain.appendBlock(DefaultBlockchain.java:229) ~[besu-core-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.ethereum.mainnet.MainnetBlockImporter.fastImportBlock(MainnetBlockImporter.java:73) ~[besu-core-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.ethereum.eth.sync.fastsync.FastImportBlocksStep.importBlock(FastImportBlocksStep.java:66) ~[besu-eth-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.ethereum.eth.sync.fastsync.FastImportBlocksStep.accept(FastImportBlocksStep.java:51) ~[besu-eth-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.ethereum.eth.sync.fastsync.FastImportBlocksStep.accept(FastImportBlocksStep.java:30) ~[besu-eth-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.services.pipeline.CompleterStage.run(CompleterStage.java:37) ~[besu-pipeline-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:130) ~[besu-pipeline-1.4.1-SNAPSHOT.jar:1.4.1-dev-83ee5fa3]
        ... 5 more

I tried with both pruning enabled/disabled with no difference. I haven't seen this error when full syncing nor when fast syncing with the digital ocean machine's local ssd instead of a block storage volume.

Versions (Add all that apply)

  • Software version: [besu --version]:
    besu/v1.4.1-dev-83ee5fa3/linux-x86_64/oracle_openjdk-java-11

  • Java version: [java -version]:
    openjdk version "11.0.2" 2019-01-15
    OpenJDK Runtime Environment (build 11.0.2+9-Debian-3bpo91)
    OpenJDK 64-Bit Server VM (build 11.0.2+9-Debian-3bpo91, mixed mode, sharing)

  • OS Name & Version: [cat /etc/*release]
    PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
    NAME="Debian GNU/Linux"
    VERSION_ID="9"
    VERSION="9 (stretch)"
    ID=debian
    HOME_URL="https://www.debian.org/"
    SUPPORT_URL="https://www.debian.org/support"
    BUG_REPORT_URL="https://bugs.debian.org/"

  • Kernel Version: [uname -a]
    Linux besu-mainnet-fast-prune-0 4.19.0-0.bpo.6-amd64 Update Jenkins Build for Pantheon → Besu considerations #1 SMP Debian 4.19.67-2+deb10u2~bpo9+1 (2019-11-12) x86_64 GNU/Linux

@timbeiko timbeiko added the bug Something isn't working label Mar 26, 2020
@timbeiko
Copy link
Contributor

Hi @e-nikolov,

Unfortunately, this is a known issue with RocksDB (see https://github.com/facebook/rocksdb/search?q=block+checksum+mismatch&type=Issues). The only workaround we can offer is for you to delete your database and start the sync again. If you did not delete your DB between sync attempts, that would explain why you hit the failure multiple times.

@e-nikolov
Copy link
Author

I do clear my database between attempts. I've done the syncing from scratch about 10 times and the strange thing is that I consistently get this issue only when using fast sync on a block storage.

I haven't had it happen with full sync on block storage nor with fast sync on local ssd.

@timbeiko
Copy link
Contributor

Ok, after checking with some of our engineers, it seems there are a few things happening here:

  • This looks like the RocksDB compaction issue described here
  • The reason it happens during fast and not full sync is that fast sync causes more compactions more which increases the likelihood of the issue above. It looks like there are too many compactions during fast sync for Digital Ocean droplets to effectively handle I/O-wise.
  • The cause of this is probably Digital Ocean's drivers, which means you can maybe "fix" the issue with a full restart. Similar cases on AWS have been fixed that way.

@timbeiko timbeiko added this to the Chupacabra Sprint 61 milestone Mar 30, 2020
@timbeiko
Copy link
Contributor

timbeiko commented Apr 6, 2020

@e-nikolov update on this issue: we are currently looking into better tuning RocksDB and/or making Besu compatible with other DBs as medium-term fixes for this issue. Will keep updating here as we have more data.

@timbeiko timbeiko removed this from the Chupacabra Sprint 61 milestone Apr 8, 2020
@timbeiko timbeiko added the P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc)) label May 13, 2020
@timbeiko timbeiko self-assigned this May 20, 2020
@nysxah
Copy link

nysxah commented Nov 3, 2020

hello, we are interested in introducing Besu as a replacement for Parity 2.5.13, on DigitalOcean w/ block storage. is fast sync (pruning, non-archive) on DO still an issue?

@timbeiko
Copy link
Contributor

@nysxah we believe it still is, but are looking into reproducing and fixing the issue shortly. I will tag you when we have a potential fix.

@nysxah
Copy link

nysxah commented Jun 1, 2021

hi, would you please comment if this issue has been fixed/resolved on Besu if running on AWS or DigitalOcean?

@danbartlett
Copy link

Still an issue on DigitalOcean as of last week

siladu pushed a commit to siladu/besu that referenced this issue Oct 28, 2024
* engine: extract execution requests from payload

* add executionrequestsv to wordlist

* update engine_getPayloadV4

* Make execution requests a sidecar, take 2

* Turn executionRequests into a sequence of bytes

* Simplify wording

* Clean up wordlist

* Applied suggestions by @marioevz

Co-authored-by: Mario Vega <[email protected]>

* Switch to the nested list representation

* Fix typo

* Apply suggestions from @lucassaldanha

Co-authored-by: Lucas Saldanha <[email protected]>

* Switch getPayloadV4 response to ExecutionPayloadV3

* Replace hash with full executionRequests object for newPayloadV4

* Fix the newPayloadV4 note

* Mention that requestType byte isn't part of encoding

* Mention SSZ encoding in the executionRequests list

* Apply suggestions from @lucassaldanha

Co-authored-by: Lucas Saldanha <[email protected]>

* Update payload.yaml examples

* Drop 32 bytes len from executionRequests definition

* engine: clarification on requests

---------

Co-authored-by: Roman Krasiuk <[email protected]>
Co-authored-by: Mario Vega <[email protected]>
Co-authored-by: Lucas Saldanha <[email protected]>
Co-authored-by: lightclient <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P2 High (ex: Degrading performance issues, unexpected behavior of core features (DevP2P, syncing, etc))
Projects
None yet
Development

No branches or pull requests

5 participants