Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-try zstd compression in sled, after the data format has stabilised #776

Closed
5 tasks
teor2345 opened this issue Jul 28, 2020 · 11 comments
Closed
5 tasks

Comments

@teor2345
Copy link
Contributor

teor2345 commented Jul 28, 2020

Sled has a zstd compression feature, should we use it?

The compression level is configurable. Since Zebra is CPU-bound, let's use a low compression level. A bunch of Zcash data is pseudo-random hashes or nonces, so we'll get the most benefit from compressing blocks of zeroes.

Pros:

  • lower disk space usage
  • faster loading from disk, if disk latency is slower than compression CPU usage

Cons:

  • extra CPU usage
  • an extra dependency
  • using non-default features of sled, which may have less testing

TODO:

@teor2345 teor2345 added Poll::Pending C-design Category: Software design work labels Jul 28, 2020
@hdevalence
Copy link
Contributor

I think this is a reasonable choice to make, and it's low-cost if we decide to change our mind later (we already don't commit to a stable on-disk format).

@teor2345
Copy link
Contributor Author

I'll go make a PR and do some testing.

Note: I don't know if sled can transparently change from zstd to non-zstd databases, it's not documented. But if we drop the zstd feature, it definitely won't be able to.

teor2345 added a commit to teor2345/zebra that referenced this issue Jul 29, 2020
Using compression could save a lot of disk space, without much impact on
CPU usage, as long as we set a low compression level.

Overrides the secp256k1 version with a commit on their master branch, to
resolve a "cc" crate version conflict with zstd (used by sled).

Closes ZcashFoundation#776.
@teor2345
Copy link
Contributor Author

teor2345 commented Jul 29, 2020

I opened PR #790, using zstd compression level 1.

As of 29 July, I had a fully synced testnet data directory which took up 12 GB, at around block 1021029. I'll do some testing and let you know what the compression ratio is.

@teor2345
Copy link
Contributor Author

Early test results: you can't change the compression of an existing sled database.

@teor2345
Copy link
Contributor Author

As of 29 July, I had a fully synced testnet data directory which took up 12 GB, at around block 1021029. I'll do some testing and let you know what the compression ratio is.

16 GB at around block 1021473.

I'm not sure what sled compression is doing, but it's definitely not making our state smaller. (Or our state sizes are extremely variable for each run.)

I'm going to re-test with and without compression to make sure.

@hdevalence
Copy link
Contributor

Did we already settle on the data layout we want to use? Since the compression is downstream of that, it might make sense to defer this decision until later.

@teor2345
Copy link
Contributor Author

I think the data layout will depend on what we decide to store on disk, which depends (in part) on the parallel verification RFC #763.

@teor2345
Copy link
Contributor Author

(I've confirmed that compressed data sizes are bigger right now, I'll defer this ticket until after the first alpha.)

@teor2345 teor2345 changed the title Design Decision: Use zstd compression in sled? Re-try zstd compression in sled, after the data format has stabilised Jul 30, 2020
@teor2345 teor2345 removed the C-design Category: Software design work label Jul 30, 2020
@hdevalence hdevalence added S-blocked Status: Blocked on other tasks and removed Poll::Pending labels Aug 17, 2020
@mpguerra
Copy link
Contributor

we should close this one if we decide to go for rocksdb in the end

@teor2345
Copy link
Contributor Author

We don't think compression is ever going to work well for sled, because we store lots of small keys.

RocksDB does compression at a block (database page) level, so it might work much better.

@teor2345
Copy link
Contributor Author

I opened #1326 for RocksDB's compaction and compression.

@mpguerra mpguerra removed the S-blocked Status: Blocked on other tasks label Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants