-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bzip2, lz4, and zstd filters from HDF5Plugins.jl #875
Add bzip2, lz4, and zstd filters from HDF5Plugins.jl #875
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some quick comments, haven't reviewed the PR fully yet.
I'm going to convert this to a draft, since I'm seeing several additional items I might want to change beyond the comments above.
|
It would also be good for @simonbyrne to take a look and review this since he overhauled much of the Filters module recently. |
I think those make sense |
I'm almost ready to take this out of draft status. I've been battle testing this against some exotic real world applications, and it appears to be holding up. Compatibility with h5py and hdf5plugins from conda-forge seems to work well. I'm starting to think about a second iteration since we may be able to multithread the ZSTD compressor easily after I merge the parameters API over at CodecZstd.jl: Using the C API directly this would be cctx = LibZstd.ZSTD_createCCtx()
LibZstd.ZSTD_CCtx_setParameter(cctx, LibZstd.ZSTD_c_nbWorkers, Threads.nthreads())
LibZstd.ZSTD_compress2(cctx, ...) |
src/filters/filters.jl
Outdated
Register the filter with the HDF5 library via API.h5z_register. | ||
Also add F to the FILTERS dictionary. | ||
""" | ||
function register_filter(filter::F) where F <: Filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason this needs to be defined on an instance rather than just a type?
It looks like the nightly tests are hanging and taking 6 hours. Is it waiting for input? |
Nightly tests are currently broken on master. You can safely ignore them. |
So main 'breaking' change is that users will now have to load in the Alternatively, we could decide to automatically load in In any case, I think it is superior to hot-load the compression filters on a as need basis. In other words, why load in all the filters if we don't need them. So to me the latest changes are more elegant even if it requires some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes to abbreviate license comments
Co-authored-by: Mustafa M <[email protected]>
…neliaSciComp/HDF5.jl into janelia/bzip2_lz4_std_filters
I did some refactoring and got the generic I suppose we could drop the instance methods, but I do anticipate that there may be a filter where one may need to define methods based on the instance rather than just the type. |
As of 351c7e1 on Julia 1.7: julia> @time using HDF5
0.441126 seconds (157.25 k allocations: 10.384 MiB, 44.38% compilation time)
julia> @time HDF5.Filters.register_filters()
1.231270 seconds (2.75 M allocations: 152.838 MiB, 2.90% gc time, 82.70% compilation time)
julia> versioninfo()
Julia Version 1.7.0
Commit 3bf9d17731 (2021-11-30 12:12 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake) |
I've been having a lot of trouble with HDF5 on M1 Macs and this fixed my issues! |
Yeah registering filters is probably a lot slower than we'd like. I think at this stage, I'll review the PR again and get it merged this weekend. We can then look into improving compilation times of the submodules. |
Specifically @kylebeggs was having an issue with Blosc. Since this branch does not automatically load Blosc, he was able to use this branch to read HDF5 files which did not require Blosc decompression. |
@simonbyrne and @musm , please let me know if there any further unresolved issues here. |
@mkitti sounds good, thanks for the patience with this one. I plan on giving it another review sometime this week merging. @simonbyrne if you have any additional comments, that would be great to add beforehand, otherwise I think this PR should be good to go. The major follow up work is to package out the submodules so we can take advantage of precompilation. |
src/filters/filters.jl
Outdated
|
||
See `API.h5z_register` | ||
""" | ||
filter_func(::F) where {F<:Filter} = filter_func(F) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless the functions on instances are actually used, I would remove them and just keep the type ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more specific? Do you want to remove filtername(::F) where F <: Filter
as well?
julia> HDF5.Filters.filtername(BloscFilter())
ERROR: MethodError: no method matching filtername(::BloscFilter)
Closest candidates are:
filtername(::HDF5.Filters.UnknownFilter) at C:\Users\kittisopikulm\.julia\dev\HDF5\src\filters\filters.jl:157
filtername(::Type{HDF5.Filters.UnknownFilter}) at C:\Users\kittisopikulm\.julia\dev\HDF5\src\filters\filters.jl:158
filtername(::Type{HDF5.Filters.H5Zblosc.BloscFilter}) at C:\Users\kittisopikulm\.julia\dev\HDF5\src\filters\H5Zblosc.jl:139
...
Stacktrace:
[1] top-level scope
@ REPL[66]:1
julia> HDF5.Filters.filtername(BloscFilter)
"blosc"
julia> HDF5.Filters.filtername(typeof(BloscFilter()))
"blosc"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess unless you need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm conflicted here since the general public API should be invoking this on the instance. It allows for more general implementations of filters. Using filtername
or filter_func
on an instance always works at the moment whether it is an UnknownFilter
or a BloscFilter
instance. Being able to garner the information statically from a type is the special case here with an important technical detail (a la @cfunction
).
What I'm thinking about is if a package like JLD2 wanted to query information about the filters to do HDF5-compatible filtering or compression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this, I concur. I've removed the methods on the Filter
instances in c450eba
I'm already prototyping this. The question I'm coming up against is how do we want to structure the dependencies in the future and how the
|
filter_cfunc(::Type{BloscFilter}) = @cfunction(blosc_filter, Csize_t, | ||
(Cuint, Csize_t, Ptr{Cuint}, Csize_t, | ||
Ptr{Csize_t}, Ptr{Ptr{Cvoid}})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed? Won't the generic one work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is useful since all the arguments here are constants and @cfunction
will return a pointer rather than a closure CFunction
. The closure depends on LLVM trampolines, which are not available on all platforms.
https://docs.julialang.org/en/v1/manual/calling-c-and-fortran-code/#Closure-cfunctions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah right. maybe add a comment to that effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment to that effect in c450eba (still needs to be pushed to this branch)
Ideally we could design the dependency to work as follows:
then if a user needs a filter they would load
In other words, I'd like the interface to remain exactly as in this PR. Would something like the following work?
|
In my prototype, function __init__()
@require Blosc="a74b3585-a348-5f62-a45c-50e91977d574" @eval begin
try
@eval using H5Zblosc
catch err
@warn "Using embedded copy of H5Zblosc. Use Pkg.add(\"H5Zblosc\") for precompilation."
include("H5ZBlosc/src/H5Zblosc.jl")
import .H5Zblosc: BloscFilter
end
register_filter(BloscFilter)
end
@require CodecBzip2="523fee87-0ab8-5b00-afb7-3ecf72e48cfd" @eval begin
try
@eval using H5Zbzip2
catch err
@warn "Using embedded copy of H5Zbzip2. Use Pkg.add(\"H5Zbzip2\") for precompilation."
include("H5Zbzip2/src/H5Zbzip2.jl")
import .H5Zbzip2: Bzip2Filter
end
register_filter(Bzip2Filter)
end
@require CodecLz4="5ba52731-8f18-5e0d-9241-30f10d1ec561" @eval begin
try
@eval using H5Zlz4
catch err
@warn "Using embedded copy of H5Zlz4. Use Pkg.add(\"H5Zlz4\") for precompilation."
include("H5Zlz4/src/H5Zlz4.jl")
import .H5Zlz4: Lz4Filter
end
register_filter(Lz4Filter)
end
@require CodecZstd="6b39b394-51ab-5f42-8807-6242bab2b4c2" @eval begin
try
@eval using H5Zzstd
catch err
@warn "Using embedded copy of H5Zzstd Use Pkg.add(\"H5Zzstd\") for precompilation."
include("H5Zzstd/src/H5Zzstd.jl")
import .H5Zzstd: ZstdFilter
end
register_filter(ZstdFilter)
end
end The |
Requires is a bit of a hack, personally I think it would be easiest to create subpackages (HDF5Blosc etc) that live in the same repo, and register them when need be. In that way, user have to explicitly add the dependency if they want the functionality (otherwise another package using Blosc could trigger the requires hook inadvertently). |
That's a good point and I didn't think about this scenario. How realistic/big a problem is that in practice? So I suppose the alternative suggestion is instead of relying on the original compression packages e.g.
Thus users could use
|
In the second scenario, where the user does |
If we are going with the second scenario that does not use Requires.jl, then perhaps I should commit my prototype now to this branch. The main effect is creating some new folders with Project.tomls and then moving the compression modules into those folders. The HDF5.jl Project.toml also changes as well. Per the comment above, we may need to do some modification of the In any scenario, perhaps we should drop
|
I've implemented the independent H5Zblosc, H5Zbzip2, H5Zlz4, and H5Zzstd subdirectory packages in #880. If we are already committed to moving in that direction, I can just merge that branch into this one. |
This is looking good. So after merging the appropriate PRs here. I'll need to register each individual subpackage? Now since the individual subpackages are 'independent' can you now just move the appropriate licenses over to each subpackages' license file (right now each subpackage doesn't have a license, which presumably we want to add since they can be installed independently of HDF5?). |
Yes. Well, technically anyone in JuliaIO could register them, so the burden of doing so does not need be completely on you. I've had to look into how to structure and register subdirectory packages for #880 , so perhaps I could do the initial registration which would be a template for you to do future updates? I could also just tell you how to do it. Frankly, the process is not as well documented as it should be.
Yes. We would need to apply a license to the individual sub-packages in order to register them. Presumably we would copy the same LICENSE file on master. To be clear, the individual sub-packages should be under MIT as well. We could also move the appropriate other copyright notices into the individual package subdirectories, although leaving them where they are at the root of the repository would also still work. It may be better to keep them where they are now just so it is clear that some of the code in the repository does have some additional notice requirements. |
superseded by #875 |
This adds bzip2, lz4, and zstd filters from HDF5Plugins.jl
To Do: Add testsCloses #874