-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitinformation of masked data #29
Comments
This issues will be addressed in #30 |
I've added more tests in #30 and merged the PR as all of them pass. I'd still love to see the impact if you could share your results @aaronspring at some point. Many thanks |
Originally posted by @aaronspring in #30 (comment) @milankl I find differences whether using dim: 4 dim: 1 dim analysis: |
Thanks Aaron, that looks awesome. Good to see that all these information patches in the exponent bits disappear! For julia> A = rand(Float32,3,3)
3×3 Matrix{Float32}:
0.408687 0.922863 0.0622634
0.838222 0.947521 0.141393
0.0944574 0.991588 0.576514
julia> signed_exponent(A)
3×3 Matrix{Float32}:
13.078 7.3829 127.515
6.70577 7.58017 18.0983
48.3622 7.93271 4.61211 It looks wrong, because Julia will interpret the exponent bits still as being biased (the information that the exponent bits are now to be interpreted differently is not stored), but you can always check that nothing went wrong by applying the inverse julia> signed_exponent([-9e-33])
1-element Vector{Float64}:
-4.739053125085073e32 Originally posted by @milankl in #30 (comment) |
Perfect, that looks really good to me. Now I'd keep these mantissa bits and then redo any analysis that you'd otherwise do with the full precision data set. Feel free to post any of that here. We haven't addressed the interpolation problem due to the ocean-atmosphere coupling, I'd be curious whether this has any consquences. |
could you point me to an example how to save a |
Use NetCDF.jl, but switch lossless compression on, like here. If you then pass on arrays with rounded mantissas via
where |
Branch #main now contains julia> using BitInformation
julia> A = rand(Float32,3,3);
julia> round!(A,1)
3×3 Matrix{Float32}:
0.5 0.1875 0.25
1.0 0.046875 0.0625
0.25 0.5 0.75
julia> bitinformation(A,masked_value=0.5f0) note that masked_value has to be of the same |
v0.5 got merged into the Julia registry now. Will leave this issue open to eventually pick some of the posts here for the documentation. |
now with the masking enabled, I am quite happy with the bitrounding results: |
Many thanks for sharing this. Makes me very happy to see that (once we've sorted out the masked array issue, which I wanted to do for a long time anyway!) we basically have another example where the bitwise real information content-approach just works. Sure you are left with the choice of 99% ... to 100% of information, but I believe that this will always be the case and in general dependent on your storage requirements. E.g. is 6x compression enough or will 10x be a game changer (because you can store more ensemble members, a higher temporal resolution, etc)? Is it data that should be kept for archiving purposes but won't be touched much? If you have any other remarks, suggestions, concerns etc. feel free to reach out here! |
I just need to archive some data and probably wont touch it again. But I think the potential for bitrounding is enormous. I will soon try it on high spatial resolution data. I expect much higher compression gains there. I am going to pitch my results in our ocean group meeting in early may. |
I agree 😄 Especially once netcdf supports more lossless codecs, I believe zstd is soon-ish going to be supported (discussion here Unidata/netcdf-c#2227) which likely yields even higher compression factors in the "archiving won't touch much" case, or higher performance in the "we'll read this data every day" case. |
As requested by @aaronspring in #25, we need to extend the
bitinformation
function with a method for masked arrays. This is an issue to discuss a potential implementation of this and to discuss progress. The methods we would need areThen I further suggest separating out the permutation for information along dimension
dim
,which is basically this bit
BitInformation.jl/src/mutual_information.jl
Lines 66 to 73 in fa2dd50
and moving the zeroing of insignificant information into its own function, like so
which should basically include these lines
BitInformation.jl/src/mutual_information.jl
Lines 48 to 54 in fa2dd50
The text was updated successfully, but these errors were encountered: