-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zstd compressor (#3) #1604
base: main
Are you sure you want to change the base?
Zstd compressor (#3) #1604
Conversation
Co-authored-by: Philippe Leprince <[email protected]> Signed-off-by: Vlad Lazar <[email protected]> * better multy-type compression * Version the Stream --------- Signed-off-by: Vlad Lazar <[email protected]> Co-authored-by: Philippe Leprince <[email protected]>
|
A belated thanks for this, it mostly looks pretty straightforward, with some caveats. Obviously, the blosc configuration needs to be sorted out. If you haven't already, you might look at OpenVDB for template for building against blosc: https://github.com/AcademySoftwareFoundation/openvdb/blob/master/CMakeLists.txt#L99 and https://github.com/AcademySoftwareFoundation/openvdb/blob/master/cmake/FindBlosc.cmake We'll obviously need to carefully consider the consequences and timing of extensions to the file format, as well as library dependencies, and possibly want to bundle this with other changes, but we have already been anticipating adding new compression algorithms, so having a working example will be helpful. A few pedantic requests:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our preference is to minimize whitespace changes wherever possible, and format code via clang-format.
I will add the cmake changes shortly. Thanks |
@cary-ilm Is there a way to share the implementation of my compressor between OpenEXR and OpenEXRCore ? The 2 are the the same minus some memory management bits. |
src/lib/OpenEXR/CMakeLists.txt
Outdated
@@ -220,3 +222,7 @@ openexr_define_library(OpenEXR | |||
OpenEXR::IlmThread | |||
OpenEXR::OpenEXRCore | |||
) | |||
|
|||
target_include_directories(OpenEXR PUBLIC "/home/vladal/bin/include/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not forget to change this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, I noticed. Thanks
Clang-format deepExamples.cpp and fix a comment typo. Clang-format deepExamples.cpp and fix a comment typo.
|
I see you updated Perhaps it's worth making library functions that turn a string like 'piz' into PIZ_COMPRESSION, and vice versa, to simplify updating these utliities (and end-user code) as new compression types are added. |
src/lib/OpenEXR/CMakeLists.txt
Outdated
|
||
target_include_directories(OpenEXR PUBLIC "/home/vladal/bin/include/") | ||
target_link_directories(OpenEXR PUBLIC "/home/vladal/bin/lib") | ||
target_link_libraries(OpenEXR PUBLIC "dl" "blosc2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The library you use here is this one here -> https://github.com/Blosc/c-blosc2 right? Does the newest release of the library (2.12.0) work with this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should work.
Hi @cary-ilm & @peterhillman I have made a number of cmake modifications based on the
my problem here is The other bit I don't understand is:
What's exactly the point here ? Another bit of magic, I'm sure ! :-) Thanks in advance |
I got this to build OK, so it looks like the Blosc build is working. As it happens, it seems the library doesn't properly support writing multi-scanline compression formats to deep images. |
The 32 number came from some experiements with some real world data I had. You are probably right that 32 might be large, when the number of samples is large. Will test with 1 and see how much does the compression degrade.
Thanks very much for looking at this. We had some discussions about this topic. I was experience spurious deadlocks in I think this all points to the idea to make ZSTD work only on 1 scanline for now. (might make a 32 scanline version later) |
I have worked on that today and will make a separate PR. |
I had issues doing I would suggest that we have a ZSTD single scanline codec, but if decide to add a multiscanline version too, then we will need to make these changes. I'm a little bit timid about pushing my changes unless we need them. If we have multiple versions of ZSTD, then I think they should be in the same release |
I looked up blosc2 a bit and it seems it turns on byte swizzle by default. This is another empirical optimization chasing after the idea that "exponent might be smooth while mantissa might fluctuate rapidly"; in other words, the higher bytes and lower bytes might exhibit different behaviours inside a float and we should collect them into separate chunks. But the blosc2 implementation probably only works for 4 byte IEEE float. I have a hard time imaging how this would optimize half: https://www.slideshare.net/PyData/blosc-py-data-2014#17 openEXR's internal zip chased a similar idea, however the existing code swizzled by 2 bytes for an entire block of numbers rather than doing a local 16 byte swizzle on a 4 byte pattern: https://github.com/AcademySoftwareFoundation/openexr/blob/86d0fb09859951d1e51a889e4ff2b7b3baecf021/src/lib/OpenEXR/ImfZip.cpp#L50C2-L72 This might be one reason why it works well for float seems not be so performant on half. |
FYI: I just created a PR on the Bazel Central Registry to add c-blosc2 2.12.0 (bazelbuild/bazel-central-registry#1349). Once this is merged Bazel builds can be fixed this way: Add to
Add to
|
* Whitespaces and licensing * WIP OpenEXRCore implementation * Brand new spanking blosc build. * Switch to single Scanline zstd compression and Single implementation * Fixed the tests * Undo whitespace changes * Last touches * Revert extra build changes
@pleprince I just added the updated version of c-blosc2 to my personal Bazel registry.
This way also my personal registry will be considered that has already |
@pleprince Just tested on Ubunut 22.04:
|
Thank you so much @Vertexwahn ! |
@cary-ilm @peterhillman
|
AcademySoftwareFoundation/openexr#1604 revealed that the c-blosc2 module still has some linker issues. This PR fixes those issues. This PR is an improvement of the c-blosc2 model at version 2.12.0. In detail, missing c files were added to the c-blosc2 target. Also, textual_hdrs were reworked. Besides this cc_tests targets were introduced to make sure the whole thing works across different systems. The defines `HAVE_PLUGINS` and `HAVE_ZSTD` were added. Without the define `HAVE_PLUGINS` zfp plugin is not initialized (which is required by OpenEXR). @phaedon Maybe this fixes also some issues on your side ;). You can create a PR and add yourself as a maintainer to this module if you like. This way you will get notified if "improvements" are made or at least it gives you a chance to veto changes ;). Furthermore, the PR policy was changed in a way that a positive review of a maintainer can lead under some circumstances to a successful merge without the need to get a review of one of the "admins/repo owners"
@pleprince bazelbuild/bazel-central-registry#1773 is merged now. Use |
@pleprince I'm a little out of my depth here. I would have thought it was OK to change the fuzz test compile flags, since the purpose of fuzz testing is to verify the code correctness, not that it builds on all architectures. Does this issue mean that adding zstd adds an additional constraint on compiling debug builds in general, or is it just a quirk of the way that the oss-fuzz builds are configured? |
That error seems to pop up because for some reason blosc is compiled with DWAF5 format symbols, but the linker used for the fuzzer does not understand this ( clang/llvm forums seem to mention this error ). I have not encountered this issue locally, but then again I have a fairly recent toolchain. What I don't understand is what makes blosc so special: I presume the fuzzer uses the same linker as the regular shared object EXR build chain ? |
Signed-off-by: Philippe Leprince <[email protected]>
Signed-off-by: Philippe Leprince <[email protected]>
Signed-off-by: Philippe Leprince <[email protected]>
Signed-off-by: Philippe Leprince <[email protected]>
AcademySoftwareFoundation/openexr#1604 revealed that the c-blosc2 module still has some linker issues. This PR fixes those issues. This PR is an improvement of the c-blosc2 model at version 2.12.0. In detail, missing c files were added to the c-blosc2 target. Also, textual_hdrs were reworked. Besides this cc_tests targets were introduced to make sure the whole thing works across different systems. The defines `HAVE_PLUGINS` and `HAVE_ZSTD` were added. Without the define `HAVE_PLUGINS` zfp plugin is not initialized (which is required by OpenEXR). @phaedon Maybe this fixes also some issues on your side ;). You can create a PR and add yourself as a maintainer to this module if you like. This way you will get notified if "improvements" are made or at least it gives you a chance to veto changes ;). Furthermore, the PR policy was changed in a way that a positive review of a maintainer can lead under some circumstances to a successful merge without the need to get a review of one of the "admins/repo owners"
…penexr into main_clusty Signed-off-by: Philippe Leprince <[email protected]>
Hello @kdt3rd and @cary-ilm |
Perhaps it would be simplest to ignore the oss-fuzz test failure for now. Once this is all merged then we can do a PR to oss-fuzz to update the build.sh |
// clevel 9 is about a 20% increase in compression compared to 5. | ||
// Decompression speed is unchanged. | ||
int zstd_level; | ||
exr_get_default_zstd_compression_level (&zstd_level); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be getting the current compression level, not the default one? Changing the compression level doesn't seem to make a difference to the output size
} | ||
|
||
blosc2_cparams cparams = BLOSC2_CPARAMS_DEFAULTS; | ||
int typeSize = inSize % 4 == 0 ? 4 : 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I follow correctly: if the total data size is a multiple of four, then this will pair together two halfs into a 4 byte 'type', but otherwise it could split floats into two 2 byte types. Is that a performance overhead?
A more pedantic approach might be to do multiple compression blocks, each containing a single data type, so blosc knows the exact size of each type. Perhaps each consecutive group of channels of the same type are compressed into a single group. (For deep, you also need to compute the total number of samples in the data, by looping through the channel list to compute the total size of the samples and dividing the data size by that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while since I looked at that code.
There is a hard requirement inside Blosc, if I remember right, that the number of data points needs to be a multiple of number of typesize. I think this code prevented a crash when you had only 1 half channel.
So if you have an even number of half channels, they are treated in 4 byte chunks, where as if you have an even number of halfs everything is a treated as half.
At some point I implemented that channel sorting to have the correct type, but for some reason, the compression rate dropped: I had 1 blosc call per channel. I suspect that 1 scanline was not enough data if it is sliced for every channel (I did not do 1 blosc call per channel size type though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my tests, If there are only half float channels, then setting typeSize to 2 gives files 52.7% of uncompressed size, setting it to 4 gives 56.8%. For comparison, ZIPS gives 53.3%, so ZIPS performs better than ZSTD compression unless inSize is not a multiple of four. If there are full float channels in the mix and typeSize is 4, ZSTD performs better than ZIPS all round. All that suggests that it might be worth compressing channels with the correct typeSize. Fortunately, with the common set of channel names (A,B,G,R,Z,ZBack,id) it just happens that the alphanumeric sorting of the channel names groups together the 16 bit channels and the 32 bit channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peterhillman your suggestion is to sort the channel data by size and potentially do 2 blosc calls ? ( when you have mixed float / half channels ).
the complication is that the pixel and the sample count tables are not available to the compressor for deep files. I have a branch that does a refactor to that effect but it was fairly large and probably full of bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking you do mulitple blosc calls, at worst one per channel. If A and Z are stored as 32 bit float, and B,G,R as 16, you would do A by itself, then B,G,R together, and finallty Z, so you would have three blosc data chunks. That way you can compress channels together and don't need the overhead of rearranging the data.
It would be a minor refactor, yes, but just of the zstd code. You shouldn't need the sample count table, but you do need the ChannelList, which you can get from the header in the same way that other codecs do. You sum up the total size of all the channels and divide inSize by that to give you the number of samples. Then the first (numberofSamples*firstChannelSize) bytes are all the data for the first channel alphanumerically, then the data for the second channel, and so on through the block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense…. I’d also need to detect when I am getting called with less data than expected ( if I remember right, first call to the compressor for the scan line is for the sample counts )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good point. You would have to look at the dataWindow and compute the expected size of the sample count table to detect whether the data is sample count or pixel data, and it would be possible the pixel data coincidentally happens to be exactly that size. If that happened, then the data would be compressed as single 4 byte channel, even if it wasn't. That wouldn't do much harm, though, because the data size would have to be relatively small.
I suppose a minor extension to the API would be to pass an extra parameter to the compressors to indicate whether they are getting normal pixel data, sample counts, or deep data.
Presumably zstd compressed data would have some kind of header in it to indicate the number of blocks and the uncompressed size of each, so the decompresser would do the right thing regardless of what the compressor did (and would mean the decompressor doesn't need to look at the header at all - it just goes off the header in the compressed data block). That would mean the API can be changed later without breaking the zstd codec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I got it right:
If the number of bytes I get is datawindow.x * 4 ( or whatever is the type for counts ), it means its a sample count compress call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the sample count table will always be (datawindow.max.x + 1 - datawindow.min.x)*sizeof(int) bytes for scanline images, and it will be (tile_area)*sizeof(int) bytes for tiled deep images. The actual pixel data will always be divisible by the sum of the channel sizes.
This code will need to be removed before merging. Signed-off-by: Philippe Leprince <[email protected]>
While we all agree that I'm wondering if anyone here have experimented simply introducing |
I had an initial prototype where I integrated zstd directly, so definitely doable. I wanted to mention that a lot of the compression gain can be attributed to the proper data preprocessing: currently using only a byte shuffle provided by blosc. We are in the process of trying to improve on this with some actual knowledge of the data being compressed which hopefully will increase the performance for half pixels and maybe even more. The main performance of blosc comes for specialized SIMD implementations, so you are right that exotic architectures will not benefit out of the box, but can always be implemented inside blosc if need be. Once we are done with iterating with the algo, we’ll deal with the dependency issues raised by this PR raises. |
I agree that a proper pre-filtering of byte patterns would benefit compression, as we're also investigate this aspect on our side. However, currently my work focus on only RGB-half data and does not look at deep data at all. I believe this PR is much further down the road in terms of deep data compression, and its backward compatibility with half. For deep data that is mostly float32 streams, I would like to mention the zfp project https://github.com/LLNL/zfp and the scientific publication that outlines its principles https://ieeexplore.ieee.org/document/6876024, but maybe you & co are already familiar with this line of work. |
Blosc already has a zfp implementation. I was not able to test it properly before since I did not implemented pixel unpacking for tiles ( this is the current bit that we are debugging ). Once that is done, this experiment is very easy to perform in a few lines of code. |
FWIW, Zfp was my first idea. I experimented with it a bit but the compression rate was inferior to zstd on my deep datasets. It performed better in lossy mode but this not a fair comparison to zstd anymore and I didn't assess which level of loss is actually acceptable. |
In my tests a few years ago, ZFP in lossless mode also was not great on EXR data. Both compression ratio and performance were not impressive. https://aras-p.info/blog/2021/08/27/EXR-Filtering-and-ZFP/ |
all the codec implementations have moved from |
Hi @TodicaIonut. I merged the latest release and a new batch of changes in a private branch. We will update this PR when it's ready. |
commit a6ee9f0 Zstd Compressor |
This PR adds a new compression algorithm optimized for Deep images.
During the development, deep images were shrunk by 20-30% compared to ZIPS with similar compression performance and 35-45% smaller when the compression level is turned to the max (about a 3-4x slowdown compared to ZIPS). The decompression speed slightly faster than ZIPS.
I did not spend a lot of time trying to optimize for non-Deep images, but
certain small non-deep example images with HALF pixels yielded a slightly worse compression rate compared to ZIPS.
The core algorithm is implemented in Blosc, with ZSTD codec with an appropriate data prefiltering.
The main advantage is that it offers a unified API access to multiple codecs and is under active development. Also the binary serialization stays compatible if ever we want to change the codecs/config parameters in the future, without introducing breaking changes.
Open questions:
I still have not figured out how to integrate Blosc properly into the OpenEXR build pipeline.