Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual concatenation of arrays with different codecs or dtypes #5

Open
TomNicholas opened this issue Mar 8, 2024 · 4 comments
Open
Labels
enhancement New feature or request zarr-python Relevant to zarr-python upstream zarr-specs Requires adoption of a new ZEP

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Mar 8, 2024

The motivation for this is in zarr-developers/zarr-specs#288. A VirtualZarrArray class can be prototyped in this library, but can't be serialized in a way that can be read by any Zarr readers until that upstream work is done.

Eventually we probably want to change the default concatenation behaviour to concatenate using VirtualZarrArrays instead of ManifestArrays.

@TomNicholas TomNicholas added zarr-python Relevant to zarr-python upstream zarr-specs Requires adoption of a new ZEP labels Mar 8, 2024
@TomNicholas TomNicholas added the enhancement New feature or request label Mar 26, 2024
@TomNicholas
Copy link
Member Author

Currently np.concat called on two ManifestArray objects with different codecs will raise an error here:

https://github.com/TomNicholas/VirtualiZarr/blob/f226093bcb8ad248b7a2c8cdcadd224747089792/virtualizarr/manifests/array_api.py#L54

@TomNicholas
Copy link
Member Author

Virtual concatenation would also allow for concatenation of arrays with different dtypes, see #215 for an example.

@TomNicholas TomNicholas changed the title Virtual concatenation of arrays with different codecs Virtual concatenation of arrays with different codecs or dtypes Aug 8, 2024
@TomNicholas
Copy link
Member Author

Note the proposed approach in zarr-developers/zarr-python#2536

@TomNicholas
Copy link
Member Author

There's an important potential use case for chunk-dependent codecs: storing LLM model weights in Zarr. Those often have "quantization" (basically bit packing, or scale and offset-like encoding), but sometimes it's on a per-chunk basis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request zarr-python Relevant to zarr-python upstream zarr-specs Requires adoption of a new ZEP
Projects
None yet
Development

No branches or pull requests

1 participant