-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use lz4 compression #739
Comments
Unfortunately there is no async support yet: PSeitz/lz4_flex#15 :/ There is https://github.com/Nemo157/async-compression/ which has support for zstd at least |
There is no rust native zstd. Adding async support to lz4 is probably much easier than writing a zstd compressor. |
How important is async? I think we might want to avoid running this on the same executor as the one that handles the network connections anyway. So maybe the |
I feel pretty strongly we should be using zstd, unless it comes at the cost of the platforms we need to support. Most modern data science is moving to zstd, among other very important use cases. How much would it hurt our portability story to rely on https://docs.rs/zstd/latest/zstd/ ? |
It is very likely this will produce issues for Deltachat when cross compiling and doing mobile builds. Zstd is also going to be slower on the wire, and looking at the latest benchmarks, not necessarily the right choice for us. I understand a lot of folks are using it, but on the fly compression is a different story, compared to anything that gets actually stored to disk. So I would strongly advise using LZ4 for now, both from a performance and portability perspective. |
I tested zstd and brotli on a backup produced by deltachat/deltachat-core-rust#4165 serialization code. Uncompressed file is 338K. Brotli reduced it to 40K even with the default settings, while I guess that the difference is due to the usage of pre-trained dictionary in brotli, which is a part of the RFC 7932. While zstd also has support for pre-trained dictionaries and tools to generate them, so far there is no standardized dictionary: facebook/zstd#3100 In Delta Chat we are going to use brotli anyway, because we want to compress relatively short streams (HTML messages) and pre-trained dictionary that comes with brotli should be good for this purpose: deltachat/deltachat-core-rust#4129 Brotli crate developed by Dropbox (https://github.com/dropbox/rust-brotli) is pure Rust, C is used only for C bindings and can be turned off ( Somewhat outdated (2016) comparison from zstd repository shows comparable performance of zstd and brotli: https://github.com/facebook/zstd/blob/dev/doc/images/DCspeed5.png For Delta Chat purposes (streaming a backup) we do not need too high throughput because Wi-Fi connection in our tests of deltachat/deltachat-core-rust#4007 resulted in around 145 Mbps, so we cannot benefit from having 300 Mbps compression speed if resulting stream is >150 Mbps and network speed is still the bottleneck. For CDN-like and data science usage compression speed matters even less, because you can precompress the file and distribute the same compressed stream multiple times. So I suggest using https://lib.rs/crates/brotli crate, it is pure rust and easy to compile, comes with pre-trained dictionary suitable for small files, delta chat is going to depend on it anyway, and brotli algorithm compares well to zstd and other algorithms. lz4 seems way worse as an algorithm compared to zstd/brotli: https://gregoryszorc.com/images/compression-bundle-modern.png |
lz4 is made to be extremely fast, so you can always use it without any performance concerns. It is not meant to get you anything near state of the art compression, because it is only "half a compression algorithm". I think zstd should be superior to brotli for larger backups. But the c dependency is slightly annoying, that is true. It can be made to work - we were using zstd in actyx and were able to cross compile to many different platforms. But pure rust would be better... |
For reference, dropboxes post about how they use brotli: https://dropbox.tech/infrastructure/-broccoli--syncing-faster-by-syncing-less |
So lz4 is indeed much faster than |
https://github.com/pseitz/lz4_flex is even faster in theory |
I think we should not do this. It depends a lot on the data whether it is worth it, and for a lot of large data out there (movie files, archives etc.) it definitely is not worth it. It would also complicate the protocol for small blobs without providing much benefit. I think we should leave this up to the user. @dignifiedquire WDYT? Close as wontfix? |
Yes, lets close it for now, we can always revisit if this becomes an issue. |
When sending files that are not already somehow encoded, there would be a large gain
There is a fully rust compression library called lz4_flex. https://crates.io/crates/lz4_flex
We could use this to reduce the amount of data that needs to be sent through en/decryption and through quic. Even for uncompressible data lz4 is very fast, so you probably would not even notice it WRT perf compared to the quic overhead.
The downside is the additional complexity. We should try this out only after having implemented/used larger bao block size. See #741
Edit: zstd provides much better compression rates but is much more expensive. Also, there is no pure rust impl of zstd, unfortunately...
The text was updated successfully, but these errors were encountered: