Re-try zstd compression in sled, after the data format has stabilised #776

teor2345 · 2020-07-28T08:31:28Z

Sled has a zstd compression feature, should we use it?

The compression level is configurable. Since Zebra is CPU-bound, let's use a low compression level. A bunch of Zcash data is pseudo-random hashes or nonces, so we'll get the most benefit from compressing blocks of zeroes.

Pros:

lower disk space usage
faster loading from disk, if disk latency is slower than compression CPU usage

Cons:

extra CPU usage
an extra dependency
using non-default features of sled, which may have less testing

TODO:

stabilise the data format
rebase PR feature: Use zstd compression with sled #790 on main
if secp256k1 is on 0.17.3 or later, remove the Cargo.toml override
test compressed and uncompressed sizes on:
- mainnet
- testnet

The text was updated successfully, but these errors were encountered:

hdevalence · 2020-07-28T19:12:56Z

I think this is a reasonable choice to make, and it's low-cost if we decide to change our mind later (we already don't commit to a stable on-disk format).

teor2345 · 2020-07-29T00:26:42Z

I'll go make a PR and do some testing.

Note: I don't know if sled can transparently change from zstd to non-zstd databases, it's not documented. But if we drop the zstd feature, it definitely won't be able to.

Using compression could save a lot of disk space, without much impact on CPU usage, as long as we set a low compression level. Overrides the secp256k1 version with a commit on their master branch, to resolve a "cc" crate version conflict with zstd (used by sled). Closes ZcashFoundation#776.

teor2345 · 2020-07-29T12:48:05Z

I opened PR #790, using zstd compression level 1.

As of 29 July, I had a fully synced testnet data directory which took up 12 GB, at around block 1021029. I'll do some testing and let you know what the compression ratio is.

teor2345 · 2020-07-29T12:49:23Z

Early test results: you can't change the compression of an existing sled database.

teor2345 · 2020-07-29T23:45:52Z

As of 29 July, I had a fully synced testnet data directory which took up 12 GB, at around block 1021029. I'll do some testing and let you know what the compression ratio is.

16 GB at around block 1021473.

I'm not sure what sled compression is doing, but it's definitely not making our state smaller. (Or our state sizes are extremely variable for each run.)

I'm going to re-test with and without compression to make sure.

hdevalence · 2020-07-30T00:51:22Z

Did we already settle on the data layout we want to use? Since the compression is downstream of that, it might make sense to defer this decision until later.

teor2345 · 2020-07-30T04:31:17Z

I think the data layout will depend on what we decide to store on disk, which depends (in part) on the parallel verification RFC #763.

teor2345 · 2020-07-30T05:27:40Z

(I've confirmed that compressed data sizes are bigger right now, I'll defer this ticket until after the first alpha.)

mpguerra · 2020-11-18T14:02:52Z

we should close this one if we decide to go for rocksdb in the end

teor2345 · 2020-11-19T04:03:18Z

We don't think compression is ever going to work well for sled, because we store lots of small keys.

RocksDB does compression at a block (database page) level, so it might work much better.

teor2345 · 2020-11-19T04:22:36Z

I opened #1326 for RocksDB's compaction and compression.

teor2345 added Poll::Pending C-design Category: Software design work labels Jul 28, 2020

teor2345 mentioned this issue Jul 29, 2020

feature: Use zstd compression with sled #790

Closed

teor2345 changed the title ~~Design Decision: Use zstd compression in sled?~~ Re-try zstd compression in sled, after the data format has stabilised Jul 30, 2020

teor2345 removed the C-design Category: Software design work label Jul 30, 2020

hdevalence added S-blocked Status: Blocked on other tasks and removed Poll::Pending labels Aug 17, 2020

teor2345 added this to the First stable / major version release 📦 milestone Sep 29, 2020

teor2345 closed this as completed Nov 19, 2020

mpguerra removed the S-blocked Status: Blocked on other tasks label Jan 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-try zstd compression in sled, after the data format has stabilised #776

Re-try zstd compression in sled, after the data format has stabilised #776

teor2345 commented Jul 28, 2020 •

edited

Loading

hdevalence commented Jul 28, 2020

teor2345 commented Jul 29, 2020

teor2345 commented Jul 29, 2020 •

edited

Loading

teor2345 commented Jul 29, 2020

teor2345 commented Jul 29, 2020

hdevalence commented Jul 30, 2020

teor2345 commented Jul 30, 2020

teor2345 commented Jul 30, 2020

mpguerra commented Nov 18, 2020

teor2345 commented Nov 19, 2020

teor2345 commented Nov 19, 2020

Re-try zstd compression in sled, after the data format has stabilised #776

Re-try zstd compression in sled, after the data format has stabilised #776

Comments

teor2345 commented Jul 28, 2020 • edited Loading

hdevalence commented Jul 28, 2020

teor2345 commented Jul 29, 2020

teor2345 commented Jul 29, 2020 • edited Loading

teor2345 commented Jul 29, 2020

teor2345 commented Jul 29, 2020

hdevalence commented Jul 30, 2020

teor2345 commented Jul 30, 2020

teor2345 commented Jul 30, 2020

mpguerra commented Nov 18, 2020

teor2345 commented Nov 19, 2020

teor2345 commented Nov 19, 2020

teor2345 commented Jul 28, 2020 •

edited

Loading

teor2345 commented Jul 29, 2020 •

edited

Loading