-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(db): Fix write stalls in RocksDB (again) #265
fix(db): Fix write stalls in RocksDB (again) #265
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #265 +/- ##
==========================================
- Coverage 34.40% 33.86% -0.54%
==========================================
Files 519 522 +3
Lines 28086 27958 -128
==========================================
- Hits 9662 9468 -194
- Misses 18424 18490 +66
☔ View full report in Codecov by Sentry. |
On the stage env (if I read RocksDB logs correctly), the latest write stalls were caused by compaction that ran just a bit too long (~2s) to be covered by retries. Since now the interval before the last retry is ~1.9s, it could fix the problem on its own. One thing that we could try is to check whether the writes are stopped immediately after DB initialization, and if they are, wait until writes become unstuck (maybe, with an upper cap like 10s), folding this time into RocksDB initialization. |
5a23e95
# What ❔ [A previous fix](#265) didn't really work judging by Merkle tree behavior on the stage env. This PR makes the initialization timeout configurable (and increases the default value from 10s to 30s; 30s is approximately equal to the compaction duration) and slightly increases the number of retries on stalled writes. ## Why ❔ Having write stalls leads to panics and is obviously bad. ## Checklist - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted via `zk fmt` and `zk lint`.
🤖 I have created a release *beep* *boop* --- ## [16.1.0](core-v16.0.2...core-v16.1.0) (2023-10-24) ### Features * Add new commitments ([#219](#219)) ([a19256e](a19256e)) * arm64 zk-environment rust Docker images and other ([#296](#296)) ([33174aa](33174aa)) * **config:** Extract everything not related to the env config from zksync_config crate ([#245](#245)) ([42c64e9](42c64e9)) * **eth-watch:** process governor upgrades ([#247](#247)) ([d250294](d250294)) * **merkle tree:** Expose Merkle tree API ([#209](#209)) ([4010c7e](4010c7e)) * **merkle tree:** Snapshot recovery for Merkle tree ([#163](#163)) ([9e20703](9e20703)) * **multivm:** Remove lifetime from multivm ([#218](#218)) ([7eda27c](7eda27c)) * Remove fee_ticker and token_trading_volume fetcher modules ([#262](#262)) ([44f7179](44f7179)) * **reorg_detector:** compare miniblock hashes for reorg detection ([#236](#236)) ([2c930b2](2c930b2)) * Rewrite server binary to use `vise` metrics ([#120](#120)) ([26ee1fb](26ee1fb)) * **types:** introduce state diff record type and compression ([#194](#194)) ([ccf753c](ccf753c)) * **vm:** Improve tracer trait ([#121](#121)) ([ff60138](ff60138)) * **vm:** Move all vm versions to the one crate ([#249](#249)) ([e3fb489](e3fb489)) ### Bug Fixes * **crypto:** update snark-vk to be used in server and update args for proof wrapping ([#240](#240)) ([4a5c54c](4a5c54c)) * **db:** Fix write stalls in RocksDB ([#250](#250)) ([650124c](650124c)) * **db:** Fix write stalls in RocksDB (again) ([#265](#265)) ([7b23ab0](7b23ab0)) * **db:** Fix write stalls in RocksDB (for real this time) ([#292](#292)) ([0f15919](0f15919)) * Fix `TxStage` string representation ([#255](#255)) ([246b5a0](246b5a0)) * fix typos ([#226](#226)) ([feb8a6c](feb8a6c)) * **witness-generator:** Witness generator oracle with cached storage refunds ([#274](#274)) ([8928a41](8928a41)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
What ❔
RocksDB write stalls are still happening, this time for a different reason. Previously, they were caused by too many immutable memtables, this time – by too many level-0 SST files. This PR:
Why ❔
Having write stalls leads to panics and is obviously bad.
Checklist
zk fmt
andzk lint
.