-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compress/flate: decompression performance #23154
Comments
@klauspost FYI |
/cc @dsnet |
I see some complaints about it elsewhere. |
There's a large amount of area for improvement. I re-implemented the decompressor in github.com/dsnet/compress/flate. I have yet to merge my changes into the standard library, but it's about 1.6x to 2.3x faster. You can use that package if you want. |
The biggest bang for buck is to get around the The approach I took was to special-case the fact that most Unfortunately, Some of the relevant performance changes:
|
Thanks @dsnet, do you have any ETA yet on when you will start to integrate your optimizations into the standard library? I know that more optimized packages exist, but projects might be reluctant to add an external dependency for this task. |
The biggest blocker is time on myself and a reviewer who's also familiar with RFC1951. It's possible to get a new implementation in for Go1.11, but who knows. Compression needs to be reviewed carefully. \cc @mdempsky @nigeltao @klauspost. Any takers as reviewers? |
I have added a few minor comments to the commits listed above. You could consider supporting the I assume you have already fuzzed it :) |
It's not production-ready yet, or any time soon, but I have been working on https://github.com/google/wuffs, which is a new, memory-safe programming language that generates C code now, and could generate Go code (using package unsafe) in the future. The kicker is that despite being memory safe (e.g. all array accesses are bounds-checked), its DEFLATE implementation benchmarks more or less as fast as the C zlib library (see https://github.com/google/wuffs/blob/master/doc/benchmarks.md). A rough estimate (different measurement frameworks, different test data) is that it is therefore about 4x faster at decoding than Go's compress/flate. On my desktop: Wuffs is 300 MB/s compared to "go test -test.bench=. compress/flate"'s 75 MB/s. There are a couple further optimization techniques (madler/zlib#292 that I proposed for C zlib but had to abort because of API backwards compatibility constraints) that I'd expect to give an extra 1.5x improvement to Wuffs' throughput. It's still a work in progress, and won't help Go programs any time soon. But that's where I'm spending my limited time these days. |
As for APIs (io.ByteReader, io.WriterTo, etc), Wuffs' API is similar to Go's golang.org/x/text/transform.Transformer. You wouldn't be able to just drop that interface (and its specific errors) into the stdlib per se, as the stdlib can't depend on x, but there may be some ideas in there worth considering. |
@nigeltao This project looks amazing. Even if it probably won't happen soon, having a new implementation under |
FWIW, I'm seeing a nice performance improvement with Go 1.11 beta2. Down to 6.8s, compared to 8s originally with 1.9.2. Not sure if this is an optimization in the deflate code, or general compiler optimizations. |
One of the largest resource hogs in umoci is compression-related, and it turns out[1] that Go's ordinary gzip implementation is nowhere near as fast as other modern gzip implementations. In order to help our users get more efficient builds, switch to a "pure-Go" implementation which has x64-specific optimisations. We cannot use zlib wrappers like [2] because of issues with "os/user" and static compilation (so we wouldn't be able to release binaries). This very simplified benchmark shows the positive difference when switching libraries (using "tensorflow/tensorflow:latest" as the image base): % time umoci unpack --image tensorflow:latest bundle # before 39.03user 7.58system 0:45.16elapsed % time umoci unpack --image tensorflow:latest bundle # after 40.89user 7.99system 0:34.89elapsed And repacking is similarly fast (in this benchmark the changes were installing FireFox and doing a repo refresh): % time umoci repack --image tensorflow:latest bundle # before 78.54user 13.71system 1:26.66elapsed % time umoci repack --image tensorflow:latest bundle # after 53.11user 4.87system 0:36.75elapsed [1]: golang/go#23154 [2]: https://github.com/vitessio/vitess/tree/v2.2/go/cgzip Signed-off-by: Aleksa Sarai <[email protected]>
One of the largest resource hogs in umoci is compression-related, and it turns out[1] that Go's ordinary gzip implementation is nowhere near as fast as other modern gzip implementations. In order to help our users get more efficient builds, switch to a "pure-Go" implementation which has x64-specific optimisations. We cannot use zlib wrappers like [2] because of issues with "os/user" and static compilation (so we wouldn't be able to release binaries). This very simplified benchmark shows the positive difference when switching libraries (using "tensorflow/tensorflow:latest" as the image base): % time umoci unpack --image tensorflow:latest bundle # before 39.03user 7.58system 0:45.16elapsed % time umoci unpack --image tensorflow:latest bundle # after 40.89user 7.99system 0:34.89elapsed But the real benefit is when it comes to compression and repacking (in this benchmark the changes were installing FireFox and doing a repo refresh): % time umoci repack --image tensorflow:latest bundle # before 78.54user 13.71system 1:26.66elapsed % time umoci repack --image tensorflow:latest bundle # after 53.11user 4.87system 0:36.75elapsed That's almost 3x faster, just by having a more optimised compression library! [1]: golang/go#23154 [2]: https://github.com/vitessio/vitess/tree/v2.2/go/cgzip Signed-off-by: Aleksa Sarai <[email protected]>
One of the largest resource hogs in umoci is compression-related, and it turns out[1] that Go's ordinary gzip implementation is nowhere near as fast as other modern gzip implementations. In order to help our users get more efficient builds, switch to a "pure-Go" implementation which has x64-specific optimisations. We cannot use zlib wrappers like [2] because of issues with "os/user" and static compilation (so we wouldn't be able to release binaries). This very simplified benchmark shows the positive difference when switching libraries (using "tensorflow/tensorflow:latest" as the image base): % time umoci unpack --image tensorflow:latest bundle # before 39.03user 7.58system 0:45.16elapsed % time umoci unpack --image tensorflow:latest bundle # after 40.89user 7.99system 0:34.89elapsed But the real benefit is when it comes to compression and repacking (in this benchmark the changes were installing FireFox and doing a repo refresh): % time umoci repack --image tensorflow:latest bundle # before 78.54user 13.71system 1:26.66elapsed % time umoci repack --image tensorflow:latest bundle # after 51.14user 3.25system 0:30.30elapsed That's almost 3x faster, just by having a more optimised compression library! [1]: golang/go#23154 [2]: https://github.com/vitessio/vitess/tree/v2.2/go/cgzip Signed-off-by: Aleksa Sarai <[email protected]>
What version of Go are you using (
go version
)?What operating system and processor architecture are you using (
go env
)?What did you do?
I noticed that the performance of
"compress/gzip"
is close to 2x slower than thegunzip(1)
command-line tool. I've tested multiple methods of performing the decompression in this simple repository:https://github.com/flx42/go-gunzip-bench
In summary, on a
333 MB
gzipped file, for the read-decompress-write process:gunzip(1)
takes 4.4s.Is there any room for improvement here for the standard implementation? The use case is decompression of downloaded Docker layers, in golang.
The text was updated successfully, but these errors were encountered: