Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitshuffle-compressed datasets cannot be read when accessed through a virtual dataset #43

Closed
loichuder opened this issue May 29, 2024 · 6 comments

Comments

@loichuder
Copy link
Member

loichuder commented May 29, 2024

Describe the bug

Ok, this one is a stretch. Thanks to silx-kit/h5web#1524, it is now possible to read datasets compressed with bitshuffle. But when creating a Virtual dataset pointing such a dataset, I get the following error:

Required filter 'bitshuffle; see https://github.com/kiyo-masui/bitshuffle' is not registered
Full traceback
HDF5-DIAG: Error detected in HDF5 (1.14.2) thread 0:
  #000: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5D.c line 1061 in H5Dread(): can't synchronously read data
    major: Dataset
    minor: Read failed
  #001: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5D.c line 1008 in H5D__read_api_common(): can't read data
    major: Dataset
    minor: Read failed
  #002: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5VLcallback.c line 2092 in H5VL_dataset_read_direct(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #003: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5VLcallback.c line 2048 in H5VL__dataset_read(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #004: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5VLnative_dataset.c line 363 in H5VL__native_dataset_read(): can't read data
    major: Dataset
    minor: Read failed
  #005: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dio.c line 383 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #006: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dvirtual.c line 2768 in H5D__virtual_read(): unable to read source dataset
    major: Dataset
    minor: Read failed
  #007: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dvirtual.c line 2689 in H5D__virtual_read_one(): can't read source dataset
    major: Dataset
    minor: Read failed
  #008: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dio.c line 383 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #009: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dchunk.c line 2856 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #010: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Dchunk.c line 4468 in H5D__chunk_lock(): data pipeline read failed
    major: Dataset
    minor: Filter operation failed
  #011: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5Z.c line 1356 in H5Z_pipeline(): required filter 'bitshuffle; see https://github.com/kiyo-masui/bitshuffle' is not registered
    major: Data filters
    minor: Read failed
  #012: /__w/libhdf5-wasm/libhdf5-wasm/build/1.14.2/_deps/hdf5-src/src/H5PLint.c line 267 in H5PL_load(): can't find plugin. Check either HDF5_VOL_CONNECTOR, HDF5_PLUGIN_PATH, default location, or path set by H5PLxxx functions
    major: Plugin for dynamically loaded library
    minor: Object not found
   

To Reproduce

  1. Select 'vds_bug.h5' in VS Code explorer
  2. Click on data_compressed (a 1D dataset compressed with bitshuffle): it displays fine
  3. Click on data_via_vds (a VDS pointing to non-compressed dataset data): it displays fine
  4. Click on data_compressed_via_vds (you get it)
  5. See error

vds_bug.zip

Expected behaviour

It should be able to display compressed datasets, even through a VDS. Interestingly, the h5wasm demo seems to display it fine ?

Context

  • OS: Linux x64 5.15.0-107-generic snap
  • VS Code version: 1.89.1
  • Extension version: v0.1.5
@axelboc
Copy link
Contributor

axelboc commented Jun 3, 2024

Ha, so it works if you first select data_compressed and then data_compressed_via_vds but not if you select data_compressed_via_vds right away.

It's because the virtual compressed dataset's filters metadata doesn't "mirror" the source dataset's filters metadata as it should. So vscode-h5web (or myHDF5) doesn't know that it needs to load the bitshuffle plugin.

I'll report on the h5wasm repo.

image

image

@loichuder
Copy link
Member Author

loichuder commented Jun 4, 2024

Ha, so it works if you first select data_compressed and then data_compressed_via_vds but not if you select data_compressed_via_vds right away.

Ha! That's why I thought it worked in the h5wasm demo: it did work because I selected data_compressed first. Just retried: if I select data_compressed_via_vds first, I get the same error I report here.

@t20100
Copy link
Member

t20100 commented Jun 4, 2024

It's because the virtual compressed dataset's filters metadata doesn't "mirror" the source dataset's filters metadata as it should.

I'm not sure, you can make a virtual dataset which gives access to multiple datasets stored with different compression filters... (Never seen this though)

@loichuder
Copy link
Member Author

loichuder commented Jun 4, 2024

I'm not sure, you can make a virtual dataset which gives access to multiple datasets stored with different compression filters

You can. The following snippet works without trouble:

import numpy
import h5py
import hdf5plugin

with h5py.File("double_filter_vds.h5", "w") as h5file:
    data = numpy.linspace(0, 10, 100)

    c_dset = h5file.create_dataset(
        "bitshuffle", data=data, **hdf5plugin.Bitshuffle(cname="lz4")
    )
    c_dset_2 = h5file.create_dataset("blosc", data=data, **hdf5plugin.Blosc2())

    vlayout = h5py.VirtualLayout(shape=(200,), dtype=dset.dtype)
    vsource = h5py.VirtualSource(dset)
    vlayout[:100] = vsource[:]
    vsource2 = h5py.VirtualSource(dset)
    vlayout[100:] = vsource2[:]

    h5file.create_virtual_dataset("data_via_vds", vlayout)

@axelboc
Copy link
Contributor

axelboc commented Jun 4, 2024

I'm not sure, you can make a virtual dataset which gives access to multiple datasets stored with different compression filters... (Never seen this though)

Yep, Brian mentioned this as well: usnistgov/h5wasm#75 (comment) — he already released a new version of h5wasm that exposes virtual sources in the metadata.

@axelboc
Copy link
Contributor

axelboc commented Jun 25, 2024

Should now be fixed in v0.1.6 of the extension.

@axelboc axelboc closed this as completed Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants