-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kerchunk doesn't translate HDF5 hard links #459
Comments
You are quite right, this is not handled essentially because there is no obvious way to represent such links in zarr. With zarr V3, there has been talk about implementing links, but I'm not sure where that conversation is up to. For kerchunk, we can recognise links, and for some special cases (single strings, IIRC) are handled. The question is what to do with the general case: should the metadata and references simply be duplicated? |
Got it, sounds like this is already a known limitation. Would some kind of warning or note in the docs be appropriate? I'd be happy to open a quick PR. This is a quick and dirty patch I'm using in my project to support hard links in H5MD, simply duplicates metadata: Change in def _translator(
self,
name: str,
h5obj: Union[
h5py.Dataset, h5py.Group, h5py.SoftLink, h5py.HardLink, h5py.ExternalLink
],
):
"""Produce Zarr metadata for all groups and datasets in the HDF5 file."""
try: # method must not raise exception
kwargs = {}
if isinstance(h5obj, h5py.SoftLink) or isinstance(h5obj, h5py.HardLink):
h5obj = self._h5f[name] Change in def translate(self, preserve_links=False):
"""Translate content of one HDF5 file into Zarr storage format.
This method is the main entry point to execute the workflow, and
returns a "reference" structure to be used with zarr/kerchunk
No data is copied out of the HDF5 file.
Returns
-------
dict
Dictionary containing reference structure.
"""
lggr.debug("Translation begins")
self._transfer_attrs(self._h5f, self._zroot)
self._preserve_links = preserve_links
if self._preserve_links:
self._h5f.visititems_links(self._translator)
else:
self._h5f.visititems(self._translator) This does require h5py>=3.11.0 since this is when |
If it's optional, I think this would be a useful addition. |
I'm using kerchunk to translate hdf5 files in the H5MD format to be readable by zarr.
Kerchunk doesn't translate hard links- wherever in the directory the h5py dataset was last assigned is the only place it remains in the resulting zarr directory and all other hard links are not visible.
Here's an example: First, I create an hdf5 file on the local filesystem
This gives the output:
However, when I do:
The tree looks like this:
This may be intentional behavior since zarr does not support linking datasets like hdf5 does. Is it possible to recreate the links in the json metadata created by
kerchunk.hdf.SingleHdf5ToZarr
to give the expected behavior? Please let me know if I'm missing anything!For reference, I am using Kerchunk 0.2.5 and Python 3.11.9
The text was updated successfully, but these errors were encountered: