Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary export extension #678

Open
koehlerson opened this issue Apr 14, 2023 · 26 comments
Open

Binary export extension #678

koehlerson opened this issue Apr 14, 2023 · 26 comments
Labels

Comments

@koehlerson
Copy link
Member

I use JLD2 for saving my simulation results. The two most important pieces that I save are solution vectors and a DofHandler. The only annoying part is if structs change they can't be reloaded that easily. However, JLD2 offers a lot of API to cover that by e.g. custom serialization and type remapping

What do you think about providing a binary export extension for e.g. JLD2 where we specify a custom serialization for the DofHandler?

@KristofferC
Copy link
Collaborator

In my opinion, if you want to reliably serialize/deserialize arbitrary Julia object then the best is to

  • record Julia version
  • record package versions (with a manifest)
  • use the Serialization stdlib.

Trying to keep backwards compatibility by writing these cusom serialization functions sounds kind of annoying to me.

@koehlerson
Copy link
Member Author

koehlerson commented Apr 14, 2023

I'd imagine to write as a sane default the .toml files of the current env to the binary output in the same way JLSO does. I'm not sure if I agree fully on the custom serialization part. There are building blocks of the DofHandler which should be easily abstracted away by e.g. scalars that doesn't change over versions. One particular example I can think of are the interpolations.

It is very likely that I can describe a Lagrange interpolation by a triple of refshape, order and dim which are invariant to changes of Interpolation. I wouldn't aim for a full backwards compatibility, just deconstructing the structs to julia native/primitive types as much as possible and thereby maximizing the backwards compatibility. I thought that the type remapping part could potentially be used after big breaking changes

@KnutAM
Copy link
Member

KnutAM commented Apr 23, 2023

I think this would be a nice feature!
But I do share Kristoffer's concerns that it might add quite a lot to Ferrite's maintenance load...

From my point of view, (or perhaps as a starting point), the current issue is to save the constraint handler when used with anonymous functions. Are there any nice solutions for this? (xref my question/attempt JuliaIO/JLD2.jl#288 (comment)).
For now I'm creating custom structs/using Returns (example). But it would be nice to be able to reload without erroring (even if one cannot recover the function)

@koehlerson
Copy link
Member Author

I think this would be a nice feature! But I do share Kristoffer's concerns that it might add quite a lot to Ferrite's maintenance load...

I'm not entirely sure why that is, because you would provide a custom serialization that is valid for a single Ferrite version (and in best case across versions if they don't break the structs) together with a bundled Manifest and Project toml entry of the current environment

From my point of view, (or perhaps as a starting point), the current issue is to save the constraint handler when used with anonymous functions. Are there any nice solutions for this? (xref my question/attempt JuliaIO/JLD2.jl#288 (comment)). For now I'm creating custom structs/using Returns (example). But it would be nice to be able to reload without erroring (even if one cannot recover the function)

I think this issue is at the end of a possible implementation. One should start with a DofHandler serialization such that you can recover the field information properly. Therefore, you'd need to specify how you decompose Cells, Interpolations etc. Boundary information would be the last step imo, if the idea is accepted as being worth to implement, which doesn't seem really the case right now.

@termi-official
Copy link
Member

We might want to think about recommendations for users regarding IO / solution iteration and put them into the docs, possibly with more examples. Related issues in the ecosystem are are Ferrite-FEM/FerriteViz.jl#1 and #828 .

@koehlerson
Copy link
Member Author

What do you think about a standardized output for HDF5 for all types where it is possible?

julia> h5open("blah.h5", "w") do h5f
           h5f["grid/cells"] = grid.cells
       end

@termi-official
Copy link
Member

Not necessary standardized, but at least recommendations or a guideline.

@koehlerson
Copy link
Member Author

But why not standardize things that are possible

julia> h5open("blah.h5", "w") do h5f
           h5f["grid/cells"] = grid.cells
           h5f["grid/nodes"] = grid.nodes
           for (facesetname,faceset) in grid.facesets
               h5f["grid/facesets/$facesetname"] = collect(faceset)
            end
       end

Seems weird to me that every user has to write something like this in order to export a grid to HDF5

@termi-official
Copy link
Member

But why not standardize things that are possible

I think this is Pandora's box. What is the format that covers all possible use cases? The code snippet you shows already just works for our internal standard serial grid, but not for the p4est grid (information about the trees is lost) and also not for the distributed grid. If we have multiple solutions on a grid with the same dof handler, how is this handled (e.g. for time series) and so on. If I have multiple grids, this also already breaks down. This is very closely related to the struggle in Ferrite-FEM/FerriteViz.jl#1 which just tried to solve this abstraction layer for a single thing.

If you have a solution to this I am happy to help on the implementation anyway.

@koehlerson
Copy link
Member Author

koehlerson commented Nov 14, 2023

Just to be clear for others. What I would like to see is something like this

julia> h5open("blah.h5", "w") do h5f
           h5f["grid"] = grid
       end
ERROR: Type Array does not have a definite size.
Stacktrace:

which is not possible since HDF5.jl needs isbitstype or Vectors of primitive or isbitstypes. So, the only way around is to overload setindex!

julia> function Base.setindex!(parent::HDF5.File, val::Grid, path::Union{AbstractString,Nothing}; pv...)
            parent["grid/cells"] = grid.cells
            parent["grid/nodes"] = grid.nodes
            for (facesetname,faceset) in grid.facesets
               parent["grid/facesets/$facesetname"] = collect(faceset)
            end
       end

julia> h5open("blah.h5", "w") do h5f
           h5f["grid"] = grid
       end
Grid{2, Quadrilateral, Float64} with 400 Quadrilateral cells and 441 nodes

In order to read the things back in you get now a Dict

julia> h5open("blah.h5", "r") do h5f
           read(h5f["grid"])
       end
Dict{String, Any} with 3 entries:
  "nodes"    => NamedTuple{(:x,), Tuple{NamedTuple{(:data,), Tuple{NamedTuple{(Symbol("1"), Symbol("2")), Tuple{Float64, Float64}}}}}}[(x = (data = (var"1" = -1.
  "cells"    => NamedTuple{(:nodes,), Tuple{NamedTuple{(Symbol("1"), Symbol("2"), Symbol("3"), Symbol("4")), NTuple{4, Int64}}}}[(nodes = (var"1" = 1, var"2" = 2
  "facesets" => Dict{String, Any}("left"=>NamedTuple{(:idx,), Tuple{NamedTuple{(Symbol("1"), Symbol("2")), Tuple{Int64, Int64}}}}[(idx = (var"1" = 1, var"2" = 4)

which is again somewhat inconvenient and one would ideally do the same as for a isbitstype in HDF5

julia> h5open("blah.h5", "r") do h5f
           read(h5f["grid"],Grid)
       end

Then people can save in whatever order/form they want, just a small convenience layer (which isn't even visible/noticeable while using it) in order to make life easier and share code that would be duplicated anyways

@KnutAM
Copy link
Member

KnutAM commented Nov 14, 2023

In general, I think having some binary exchange format for Ferrite data would be nice!

Just to mention an alternative approach, which can be done from outside Ferrite.jl without doing type piracy (becomes more useful/practical after #692) would be to define

struct FerriteHDF{G<:AbstractGrid,F<:HDF5.File}
    grid::G
    h5f::F
end
function FerriteHDF(name::String, grid::AbstractGrid)
    h5f = open(name, "w")
    h5f ["grid/cells"] = grid.cells
    h5f ["grid/nodes"] = grid.nodes
    for (facesetname,faceset) in grid.facesets
        h5f["grid/facesets/$facesetname"] = collect(faceset)
    end
    return FerriteHDF(grid, h5f)
end
Base.close(io::FerriteHDF) = close(io.h5f)
# Could also define the do-block, append modes and other conveniences as required....
# And define e.g. `Ferrite.write_celldata(::FerriteHDF, data)` 
# Would be even more general if the same was done for the dofhandler instead of the grid...

IMO, doing something like this in an external package could be nice - and if it becomes well-adopted we can consider bringing it in as a weak dep via HDF5.jl (slightly tricky with the type, but I guess that type could be defined outside the extension) Similarly (as previously discussed) could be done for JLD2.jl

Currently, there isn't any interface for reading this back in, but one could define

function FerriteHDF5(path::String, G::Type{<:AbstractGrid})
    h5f = h5open(path, "r")
    grid = dict_to_grid(G, read(h5f["grid"]))
    return FerriteHDF5(grid, h5f)
end
# Similar to above, a do-construct can be defined as well. 
# The following methods could then be defined in Ferrite if we have an importer functionality,
# but initially could be defined by the package defining FerriteHDF5...
read_celldata(::FerriteHDF5, key)

(Of course, I'm biased here since I have #692 in my head)

@koehlerson
Copy link
Member Author

I'm not sure, but why should this be type piracy if we own the type Grid?

@KnutAM
Copy link
Member

KnutAM commented Nov 14, 2023

From inside Ferrite it is perfectly fine! (I was referring to doing this in an external package)

@KnutAM
Copy link
Member

KnutAM commented Nov 14, 2023

I think the choice for having this supported by Ferrite is more that one has to commit to one backend as maintaining multiple might be too much to promise. HDF5 might certainly be a nice choice!

Out of interest, why choosing this over JLD2 as was discussed above? Or would this somehow work naturally with JLD2 since that is hdf5-based?

@koehlerson
Copy link
Member Author

I received some reviewer comments in a funding proposal that something more established like HDF5 should be used instead of JLD2 (due to its "known problems"). The "known problems" weren't specified, but I think there is some consensus that HDF5 is somewhat reliable and established. Aside that, jld2 leaks quite severely memory (not sure if HDF5 has the same problem)

@KnutAM
Copy link
Member

KnutAM commented Nov 14, 2023

Thanks - that is interesting to get from a reviewer (even if vague)

At least the current problem with the circular dependency for the DofHandler could be another argument. But it just feels like a lot of things that "just work" with JLD2 would have to be manually implemented for HDF5. On the other hand, perhaps that would be a more reliable solution...

In the future, I plan to overhaul the IO part of FerriteProblems (which currently uses JLD2), and after that, I might have better views/opinions. But it would be great if there already is a solution to tap into then:)

@koehlerson
Copy link
Member Author

koehlerson commented Nov 15, 2023

Do you happen to know if JLD2 allows for the same setindex! overload? I find it quite nice that you don't need to promise a mapping with HDF5 to a Julia user written composite type but instead, you can rely on primitives or array of primitives. So I like that a "custom serialization" can be done without having a bijective map from julia object to storage and vice verca, but instead you can just map to storage. This can make sense sometimes, e.g. material state structs which may change during development but most likely you will save from the get go the interesting data. Of course, you can have for JLD2 export another struct which holds the interesting data to store, but again, as soon as you change, you have the annoyance of type remapping

@KnutAM
Copy link
Member

KnutAM commented Nov 15, 2023

Don't know, but doubt it since JLD2.jl doesn't have HDF5.jl as dep. JLD.jl does, maybe it works for that one? But also not sure if this design was causing JLD2.jl to be created as the replacement.

@koehlerson
Copy link
Member Author

koehlerson commented Nov 15, 2023

I meant instead of

julia> function Base.setindex!(parent::HDF5.File, val::Grid, path::Union{AbstractString,Nothing}; pv...)
            parent["grid/cells"] = grid.cells
            parent["grid/nodes"] = grid.nodes
            for (facesetname,faceset) in grid.facesets
               parent["grid/facesets/$facesetname"] = collect(faceset)
            end
       end

julia> h5open("blah.h5", "w") do h5f
           h5f["grid"] = grid
       end

this

julia> function Base.setindex!(parent::JLD2.File, val::Grid, path::Union{AbstractString,Nothing}; pv...)
            parent["grid/cells"] = grid.cells
            parent["grid/nodes"] = grid.nodes
            for (facesetname,faceset) in grid.facesets
               parent["grid/facesets/$facesetname"] = collect(faceset)
            end
       end

julia> jldopen("blah.jld2", "w") do jld
           jld["grid"] = grid
       end

@koehlerson
Copy link
Member Author

see JuliaIO/JLD2.jl#504

@JonasIsensee
Copy link

JonasIsensee commented Nov 15, 2023

I meant instead of

julia> function Base.setindex!(parent::HDF5.File, val::Grid, path::Union{AbstractString,Nothing}; pv...)
            parent["grid/cells"] = grid.cells
            parent["grid/nodes"] = grid.nodes
            for (facesetname,faceset) in grid.facesets
               parent["grid/facesets/$facesetname"] = collect(faceset)
            end
       end

julia> h5open("blah.h5", "w") do h5f
           h5f["grid"] = grid
       end

this

julia> function Base.setindex!(parent::JLD2.File, val::Grid, path::Union{AbstractString,Nothing}; pv...)
            parent["grid/cells"] = grid.cells
            parent["grid/nodes"] = grid.nodes
            for (facesetname,faceset) in grid.facesets
               parent["grid/facesets/$facesetname"] = collect(faceset)
            end
       end

julia> jldopen("blah.jld2", "w") do jld
           jld["grid"] = grid
       end

The recommended way to do this would be to define a custom serialization:
https://juliaio.github.io/JLD2.jl/dev/customserialization/

https://juliaio.github.io/JLD2.jl/dev/advanced/
The advanced docs explain how one can deal with structures that have changed (need custom conversion).
However, if you want to be able to upgrade structures without the user coming into contact with it,
you would probably need to write your own write_to_file(filename, object) functions
(that then write and read the relevant info of your structures and manually ensure backwards compatibility by adding fallbacks every time the structures change).
If you choose to go down that route, it doesn't matter whether you use JLD2 or HDF5. The files will not differ much.

Aside that, jld2 leaks quite severely memory (not sure if HDF5 has the same problem)

I have not encountered that before. Please file an issue, if you can trigger this.

@koehlerson
Copy link
Member Author

koehlerson commented Nov 15, 2023

The recommended way to do this would be to define a custom serialization: https://juliaio.github.io/JLD2.jl/dev/customserialization/

https://juliaio.github.io/JLD2.jl/dev/advanced/ The advanced docs explain how one can deal with structures that have changed (need custom conversion). However, if you want to be able to upgrade structures without the user coming into contact with it, you would probably need to write your own write_to_file(filename, object) functions (that then write and read the relevant info of your structures and manually ensure backwards compatibility by adding fallbacks every time the structures change). If you choose to go down that route, it doesn't matter whether you use JLD2 or HDF5. The files will not differ much.

I think I was thinking more about the latter case, because the datastructure Grid won't change across version so much (or at all), that you cannot reconstruct it from the stuff that I have in the snippet above.

Aside that, jld2 leaks quite severely memory (not sure if HDF5 has the same problem)

I have not encountered that before. Please file an issue, if you can trigger this.

It's quite hard to come up with a reproducer, it happens for me only in simulations that compute for days. Also julia threads error for me in that case after closing the REPL. I hope I can come up with a reproducer at some point.

@koehlerson
Copy link
Member Author

koehlerson commented Nov 15, 2023

Or to put it differently: I can think of a lot of cases, where I don't want to have an explicit bijective mapping from storage type to (actual user) julia type. Sometimes I just want to dump data (and not being able to reconstruct the type from which I gathered the data) and read it out as such (NamedTuple for instance), especially in the beginning of a project, where you do a lot of changes to structs. There, I find the type remapping a little bit annoying because I cannot promise to have the mapping from storage to julia type. Is there some recommend way to save instead of the type only some fields which consists of Julia primitives or arrays of primitives, more or less in the way I did with the setindex! dispatch for Grid but in JLD2 @JonasIsensee ? Can I do the same overload without writing a seperate write_to_file function? Or do you think this is exactly where I should use JLD2 instead of HDF5?

@JonasIsensee
Copy link

JonasIsensee commented Nov 15, 2023

@koehlerson
I've thought about this kind of use case myself for quite a bit and must say that I don't have a great solution either.
The JLD2 custom serialization interface is great if you have

struct ComplexStruct
# lot's of custom stuff
end

struct SerializedVersion
# important data
# only contains basic types (array / int / float / string )
end

JLD2.writeas(::Type{ComplexStruct}) = SerializedVersion
JLD2.wconvert(::Type{SerializedVersion}, x::ComplexStruct) = # conversion
JLD2.rconvert(::Type{ComplexStruct}, x::SerializedVersion) = # convert back / upgrade
# as long as SerializedVersion is unchanged, ComplexStruct may be changed (updating the conversion accordingly)

# if the serialized version is changed, the old one should be kept and the new one needs a new name
# e.g. SerializedVersionV1
# to ensure full backward compatibility

Caveat: JLD2 stores the type signature of the ComplexStruct. If this has extensive type parameters (function types, nested parameterized types etc.) this can become a problem.

EDIT:
Given that you keep mentioning setindex!, you seem to care about the user interface.
It is possible but I'm not sure if overloading setindex! is really a good idea. You'd have to inject your code into the JLD2 mechanism properly to ensure that file["key"] = (, object) also does the right thing?
Semantically it seems rather brittle to have JLD2 only do the right conversion thing (your code)
when the Grid is a toplevel storing call. (not wrapped in some other structure)

@koehlerson
Copy link
Member Author

It is possible but I'm not sure if overloading setindex! is really a good idea. You'd have to inject your code into the JLD2 mechanism properly to ensure that file["key"] = (, object) also does the right thing?

I'd love to have some intermediate dispatch stuff_to_save, which is for type Any an identity mapping and can be dispatched by the user for custom types to the subset of things that HDF5 supports ootb like primitives, array of primitives and isbitstype compound types, such that it stores by default the returned things in a named tuple (or something similar). The setindex! is more or less just the quick and dirty way to have that in HDF5.jl. I'm not sure how feasible that is or which pitfalls exist along that route but I think it would be nice to have an "optional plain HDF5 data" behavior in JLD2. However, I'm not an expert or anything, it's just the ideal behavior I have in mind after using JLD2 for some years.

Semantically it seems rather brittle to have JLD2 only do the right conversion thing (your code)
when the Grid is a toplevel storing call. (not wrapped in some other structure)

I was hoping that setindex! is called recursively but you are right, if that's not the case this stuff_to_save call should happen for anything that should be saved, i.e. for each field of each struct.

@JonasIsensee
Copy link

JonasIsensee commented Nov 15, 2023

This sounds like a really nice idea.
I think the best way to experiment with this would be to create a new JLD2 variant package
that depends on JLD2 but has extra asserts on the types it handles and adds your intermediate conversion level.

One of the things that limits flexibility in JLD2 is that it tries to be type stable to be fast. Building efficient memory layouts for structures also requires quite some knowledge about the types..

Further discussion might be more appropriate in JLD2.
I'd be willing to assist in efforts in that direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants