You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the zarr v2 / v3 transition has highlighted some room for improvement in numcodecs. I would like to propose the following changes to this library, some of them breaking:
type annotations
type annotations are currently largely absent from this library. This makes it hard to integrate numcodecs into libraries that are heavily annotated, namely zarr-python 3.x. I don't think any of the codec methods are particularly hard to annotate, but the existence of the pickle codec means that we are committed to accepting object as a input / output type in the abc. We should consider if we should deprecate pickle for this and other reasons. There are 2 main classes of methods that would benefit from annotations: the encode/decode, so consumers can know what kind of data goes in / comes out of the codec, and get_config / from_config, so users can know how a codec will serialize to / from JSON. relevant PRs: (chore) type hints for tests #698, (chore) Type hints for abc codec codec_id attribute #702, (chore) Type hints for GZip #701
package layout
everything is in the same module namespace, including the abc. Each codec is a single .py or .pyx file. Let organize this a bit and put all the codecs in their own module, separate from the abc. Lets give each codec its own module (put blosc.pyx inside a module called blosc) lets also pull the tests out the source directory, and use a src/numcodecs layout. relevant issues: copy zarr-python dev environment #697
build
numcodecs does not use a declarative build process -- compiling various codecs depends on libraries that are not present in pyproject.toml, and requires creating a conda environment. We can do better than this. I know we can use pixi to declare our compiler dependencies in pyproject.toml, but there might be other tools that would work. relevant issues: declarative build #703
v2 and v3 codec serialization
zarr v2 defines the JSON serialization of a codec as {'id': <str>, **config}. zarr v3 defines the JSON serialization of a codec as {'name': <str>, 'configuration': {**config}}. As far as I know, this is the only material difference between v2 and v3 codecs. So the exact same codec class should work for zarr v2 and zarr v3. We just need a way for users in a v2 context to use the v2 serialization, and likewise for v3. I can think of a few solutions here. relevant issues: Integrate Zarr v3 compatibility module with registry. #699, formalize old and new styles of json serialization #686
I think we should be open to breaking changes here.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
the zarr v2 / v3 transition has highlighted some room for improvement in numcodecs. I would like to propose the following changes to this library, some of them breaking:
type annotations are currently largely absent from this library. This makes it hard to integrate numcodecs into libraries that are heavily annotated, namely zarr-python 3.x. I don't think any of the codec methods are particularly hard to annotate, but the existence of the pickle codec means that we are committed to accepting
object
as a input / output type in the abc. We should consider if we should deprecate pickle for this and other reasons. There are 2 main classes of methods that would benefit from annotations: theencode/decode
, so consumers can know what kind of data goes in / comes out of the codec, andget_config
/from_config
, so users can know how a codec will serialize to / from JSON. relevant PRs: (chore) type hints for tests #698, (chore) Type hints for abc codeccodec_id
attribute #702, (chore) Type hints for GZip #701everything is in the same module namespace, including the
abc
. Each codec is a single .py or .pyx file. Let organize this a bit and put all the codecs in their own module, separate from theabc
. Lets give each codec its own module (putblosc.pyx
inside a module calledblosc
) lets also pull the tests out the source directory, and use asrc/numcodecs
layout. relevant issues: copy zarr-python dev environment #697numcodecs does not use a declarative build process -- compiling various codecs depends on libraries that are not present in
pyproject.toml
, and requires creating a conda environment. We can do better than this. I know we can usepixi
to declare our compiler dependencies inpyproject.toml
, but there might be other tools that would work. relevant issues: declarative build #703zarr v2 defines the JSON serialization of a codec as
{'id': <str>, **config}
. zarr v3 defines the JSON serialization of a codec as{'name': <str>, 'configuration': {**config}}
. As far as I know, this is the only material difference between v2 and v3 codecs. So the exact same codec class should work for zarr v2 and zarr v3. We just need a way for users in a v2 context to use the v2 serialization, and likewise for v3. I can think of a few solutions here. relevant issues: Integrate Zarr v3 compatibility module with registry. #699, formalize old and new styles of json serialization #686I think we should be open to breaking changes here.
Beta Was this translation helpful? Give feedback.
All reactions