Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bzip2, lz4, and zstd independent compression filter packages #880

Merged
merged 30 commits into from
Dec 17, 2021

Conversation

mkitti
Copy link
Member

@mkitti mkitti commented Dec 8, 2021

This branch splits the compression plugin code into independent subdirectory packages. Consider merging this into #875.

The main purpose of this pull request is to test CI.

licenses/H5Zbzip2_LICENSE.txt Outdated Show resolved Hide resolved
@mkitti
Copy link
Member Author

mkitti commented Dec 11, 2021

All tests are now passing, including nightly.

@@ -0,0 +1,12 @@
name = "H5Zbzip2"
uuid = "094576f2-1e46-4c84-8e32-c46c042eaaa2"
version = "0.16.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea will be to sync the version with the main HDF5.jl package version?

I don't have a problem with this, but as you probably already know, these 'independent' packages could also just maintain their own version number that get bumped when we modify the subpackages.

I think the main benefit of the current approach is simplicity in bumping version numbers across all packages. OTOH, if we keep an independent version number I think then that reduces precompilation when we make new releases.
From judging changes in the original blosc file, changes to the filter packages are likely to be infrequent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We can only tag the whole repository, so we might as well keep it in sync with the tags.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can tag a sub-package associated with a subdirectory of a repository via @JuliaRegistrator register subdir=.... Makie e.g., uses this approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is a tag and release from the git perspective.

Copy link
Member

@musm musm Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep all the subpackages at version 0.1.0 and tag them individually. I think the benefit is we avoid precompilation of the subpackages after new HDF5.jl releases, and most of these filter packages won't be changing frequently. I believe that makes more sense and follows semver, instead of bumping the submodules in instances were HDF5.jl was changed but the subpackages were untouched.

For releases I've been using the registrator bot. It looks like it should be straightforward to accommodate subpackages with the registrator from @thchr comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we can tell the registrator bot whatever version we want, but the question is how to do you mark these versions in Git? Do you plan on making an individual tag for each sub-package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see your point now, since these are not git submodules...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this again, does the git tag version even matter? No.

So, what'll happen is that the git tag version will be synced with the HDF5.jl version as is currently the case.
The git tag version will not agree with the tagged version of the filter subpackages, which will have their own independent versions as specified in their own Project.toml. This is the case since we aren't using git submodules, which is more trouble than it's worth.

To me this makes sense and is desired behavior since we won't have to tag all the filter packages superfluously every time a change is made in the main HDF5.jl module that doesn't modify the filter submodules.

# We need to preserve the output of cbuffer_sizes under Julia 1.8 (as of 2021/12/11)
# Otherwise their Refs get freed and the buffer becomes corrupted
# If this fails, consider copying `in` or wrapping that array.
GC.@preserve outbuf_size cbytes blocksize begin
Copy link
Member Author

@mkitti mkitti Dec 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this line, H5Zblosc appears to error with Julia 1.8 in development.

cbuffer_sizes calls blosc_cbuffer_sizes which returns a size values via pointers / Refs.

https://github.com/JuliaIO/Blosc.jl/blob/8ab4e3cf60dd45ee598ad879caf99948bd2a2718/src/Blosc.jl#L129-L137

function cbuffer_sizes(buf)
    s1 = Ref{Csize_t}()
    s2 = Ref{Csize_t}()
    s3 = Ref{Csize_t}()
    ccall((:blosc_cbuffer_sizes,libblosc), Cvoid,
          (Ptr{UInt8}, Ref{Csize_t}, Ref{Csize_t}, Ref{Csize_t}),
          buf, s1, s2, s3)
    return (s1[], s2[], s3[])
end

Somehow it appears that the Refs are getting garbage collected and the value of outbuf_size is corrupted. This does not make sense to me, but this seems to have worked fine until Julia 1.8, so perhaps there is a bug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been resolved in JuliaLang/julia#43408 (pending merge)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay, so this is unrelated to JuliaIO/Blosc.jl#86 or does the fix also require making the changes according to your comment in JuliaIO/Blosc.jl#86 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the underlying fix has been merged. I propose we remove mitigating patch here as it won't be an issue henceforth.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can tag a commit as H5Zblosc_v0.1.0 when you make a release. The main convenience of keeping the versions somewhat aligned is making it easy to know what versions are compatible with what other versions.

Copy link
Member

@musm musm Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true. OTOH, I think it might not be too difficult to determine if a non-backwards compatible change is made to the filter subpackages. At worst, we could always bump the compatible HDF5 version to the latest one in the filter subpackage every time a change is made to the subpackage or HDF5.jl methods that relate to the filter.

test/external.jl Outdated
@@ -50,5 +54,6 @@ close(source_file)

rm(fn1)
# rm(fn2)
@debug "external tests did not delete" fn2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot delete fn2 or run those above tests because some HDF5 handle is still open. Because of the external link, HDF5 is set to use "weak" closing, so closing the file is not sufficient. Ultimately h5close at the end of runtests.jl does close the files and allows for the tempfile to be deleted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I have an idea on how to fix this.

Could we split this off from this PR?

Ideally I'd like to squash-merge all the new filter functionality in a single commit/PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the debug message

test/runtests.jl Outdated Show resolved Hide resolved
test/swmr.jl Outdated Show resolved Hide resolved
@mkitti mkitti marked this pull request as ready for review December 13, 2021 19:56
@musm
Copy link
Member

musm commented Dec 14, 2021

Any chance you could rebase with master.

@musm
Copy link
Member

musm commented Dec 14, 2021

I had this comment here: https://github.com/JuliaIO/HDF5.jl/pull/880/files#r767927849
and also
https://github.com/JuliaIO/HDF5.jl/pull/880/files#r767937926
to remove the mitigating patch as it's now been fixed in Julia. master

@mkitti mkitti force-pushed the janelia/h5z_subdir_packages branch from f796447 to e344882 Compare December 15, 2021 00:08
@mkitti
Copy link
Member Author

mkitti commented Dec 15, 2021

These two pull requests should fix the remaining downstream errors.

JuliaIO/MAT.jl#170 (fixes SparseMatrixCSC errors for Julia 1.7)
JuliaIO/JLD.jl#305 (adds H5Zblosc, so set_blosc! exists)

Anything else? You are welcome to make commits directly.

@musm
Copy link
Member

musm commented Dec 15, 2021

Anything else? You are welcome to make commits directly.

You'll have to enable "allow commits" in the PR since I don't have access.

@mkitti
Copy link
Member Author

mkitti commented Dec 15, 2021

Anything else? You are welcome to make commits directly.

You'll have to enable "allow commits" in the PR since I don't have access.

I invited you to the repository. I guess it's trickier since it's coming from an organizational account.

@mkitti
Copy link
Member Author

mkitti commented Dec 15, 2021

Project.toml Outdated Show resolved Hide resolved
Comment on lines 311 to 321
macro dev_embedded_filters()
quote
@eval begin
using Pkg
Pkg.develop(PackageSpec(; path=joinpath(dirname(pathof(HDF5)), "filters", "H5Zblosc")))
Pkg.develop(PackageSpec(; path=joinpath(dirname(pathof(HDF5)), "filters", "H5Zbzip2")))
Pkg.develop(PackageSpec(; path=joinpath(dirname(pathof(HDF5)), "filters", "H5Zlz4")))
Pkg.develop(PackageSpec(; path=joinpath(dirname(pathof(HDF5)), "filters", "H5Zzstd")))
end
end
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the macro which devs the filter packages into the test environment.

@musm
Copy link
Member

musm commented Dec 15, 2021

Looks like there may be an aws issue downloading the non ubuntu binaries?

@musm musm changed the title Independent compression filter packages Add bzip2, lz4, and zstd independent compression filter packages Dec 16, 2021
@musm
Copy link
Member

musm commented Dec 16, 2021

@mkitti any chance you could review my changes.

@musm
Copy link
Member

musm commented Dec 16, 2021

I think we need to in a follow up PR or this PR, either:

(1) deprecate setting :blosc for properties and warn users to load in the external H5Zblosc package to use the blosc filter.

(2) Alternatively, we could add hooks for :zstd, :lz4, :bzip2 like we do for the blosc package
i.e. add the following for the external filters.

    @require H5Zblosc="c8ec2601-a99c-407f-b158-e79c03c2f5f7" @eval begin
        set_blosc!(p::Properties, val::Bool) = val && push!(Filters.FilterPipeline(p), H5Zblosc.BloscFilter())
        set_blosc!(p::Properties, level::Integer) = push!(Filters.FilterPipeline(p), H5Zblosc.BloscFilter(level=level))
    end

@simonbyrne maybe you also have an opinion on which option is more desirable from a ux perspective.

@mkitti
Copy link
Member Author

mkitti commented Dec 16, 2021

I generally think that we should deprecate the older syntax blosc = 3 or otherwise come up with a way to implement properties in an extensible manner.

Rather than this

HDF5.jl/src/properties.jl

Lines 438 to 452 in 00dcb13

function class_setproperty!(::Type{DatasetCreateProperties}, p::Properties, name::Symbol, val)
name === :alloc_time ? set_alloc_time!(p, val) :
name === :chunk ? set_chunk!(p, val) :
name === :external ? API.h5p_set_external(p, val...) :
name === :filters ? set_filters!(p, val) :
name === :layout ? set_layout!(p, val) :
# set-only for convenience
name === :blosc ? set_blosc!(p, val) :
name === :deflate ? set_deflate!(p, val) :
name === :fletcher32 ? set_fletcher32!(p, val) :
name === :shuffle ? set_shuffle!(p, val) :
# deprecated
name === :compress ? (depwarn("`compress=$val` keyword option is deprecated, use `deflate=$val` instead",:class_setproperty!); set_deflate!(p, val)) :
class_setproperty!(superclass(DatasetCreateProperties), p, name, val)
end

an extensible system would work as follows

# Defined in HDF5.jl
function class_setproperty!(::Type{DatasetCreateProperties}, p::Properties, name::Symbol, val)
   class_setproperty!(::Type{DatasetCreateProperties}, p, Val(name), val)
end

# Defined in H5Zblosc.jl
import HDF5: class_setproperty!
class_setproperty!(::Type{DatasetCreateProperties}, p::Properties, Val{:blosc}, val) = set_blosc!(p, val)

That said, I think the filters interface is much better and is already extensible. Filters should have an order to them in a pipeline. Thus, I would lean towards deprecating the properties interface for filters.

Comment on lines +5 to +9
filter_path = joinpath(dirname(pathof(HDF5)), "filters")
Pkg.develop(PackageSpec(path=joinpath(filter_path, "H5Zblosc")))
Pkg.develop(PackageSpec(path=joinpath(filter_path, "H5Zbzip2")))
Pkg.develop(PackageSpec(path=joinpath(filter_path, "H5Zlz4")))
Pkg.develop(PackageSpec(path=joinpath(filter_path, "H5Zzstd")))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did like the idea of having this in some kind of callable closure. I anticipate downstream projects may want to do something similar via CI. Perhaps though it could it its own file under test/?

@mkitti
Copy link
Member Author

mkitti commented Dec 16, 2021

Overall the changes look fine to me. We just need to register the packages for some of the tests to work. 👍

@mkitti
Copy link
Member Author

mkitti commented Dec 16, 2021

It will probably be easier to iterate and test this once the subpackages are registered. I suggest registering the subpackages well before we register HDF5 0.16 itself. Then at least we can dev everything in the normal fashion.

@musm
Copy link
Member

musm commented Dec 16, 2021

Sure, is dev'ing not working in some cases for you ? I've just manually kept calling

Pkg.develop(PackageSpec(path=joinpath(filter_path, "H5Zblosc")))

when need be...

In any case, we'll have to register the subpackages first. I think I'll merge this PR now and we then need to iterate on the new filters interface. I do agree that the second approach is much more flexible, so we'd have to appropriately deprecate existing calls.

@musm musm merged commit 4639336 into JuliaIO:master Dec 17, 2021
@musm
Copy link
Member

musm commented Dec 17, 2021

Thank you very much @mkitti for your effort and patience with these PRs! I'm really excited for these changes in the upcoming v0.16 release. Also sorry to drag out some of the licensing issues.

@musm musm deleted the janelia/h5z_subdir_packages branch December 19, 2021 17:54
@musm
Copy link
Member

musm commented Dec 19, 2021

I've submitted registration for all the subpackages, they will take a day or two since they are blocked until a manual merge is done, as they depend on v0.16 which is yet released.

@mkitti
Copy link
Member Author

mkitti commented Dec 19, 2021

We could ask for manual merge if needed. Perhaps it might be good to reach out so others know what we are doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants