-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-try zstd compression in sled, after the data format has stabilised #776
Comments
I think this is a reasonable choice to make, and it's low-cost if we decide to change our mind later (we already don't commit to a stable on-disk format). |
I'll go make a PR and do some testing. Note: I don't know if sled can transparently change from zstd to non-zstd databases, it's not documented. But if we drop the zstd feature, it definitely won't be able to. |
Using compression could save a lot of disk space, without much impact on CPU usage, as long as we set a low compression level. Overrides the secp256k1 version with a commit on their master branch, to resolve a "cc" crate version conflict with zstd (used by sled). Closes ZcashFoundation#776.
I opened PR #790, using zstd compression level 1. As of 29 July, I had a fully synced testnet data directory which took up 12 GB, at around block 1021029. I'll do some testing and let you know what the compression ratio is. |
Early test results: you can't change the compression of an existing sled database. |
16 GB at around block 1021473. I'm not sure what sled compression is doing, but it's definitely not making our state smaller. (Or our state sizes are extremely variable for each run.) I'm going to re-test with and without compression to make sure. |
Did we already settle on the data layout we want to use? Since the compression is downstream of that, it might make sense to defer this decision until later. |
I think the data layout will depend on what we decide to store on disk, which depends (in part) on the parallel verification RFC #763. |
(I've confirmed that compressed data sizes are bigger right now, I'll defer this ticket until after the first alpha.) |
we should close this one if we decide to go for rocksdb in the end |
We don't think compression is ever going to work well for sled, because we store lots of small keys. RocksDB does compression at a block (database page) level, so it might work much better. |
I opened #1326 for RocksDB's compaction and compression. |
Sled has a
zstd
compression feature, should we use it?The compression level is configurable. Since Zebra is CPU-bound, let's use a low compression level. A bunch of Zcash data is pseudo-random hashes or nonces, so we'll get the most benefit from compressing blocks of zeroes.
Pros:
Cons:
sled
, which may have less testingTODO:
The text was updated successfully, but these errors were encountered: