(feat): full v2 compat via python fallback #84

ilan-gold · 2025-01-30T13:59:32Z

Just a draft but moving in the direction of full v2 support by falling back to zarr-python and also fixing some serialization issues. I think this mechanism, even if it is obviated by development on the rust side, is not bad to have as it allows us to keep up a bit more with new features/bugs etc.

I got rid of the full-chunk fetching (i.e., for some simple integer indexing cases) in rust because it was actually slower than just doing it in python. This also lets us be a bit more clear about what we support and don't because it is simpler now as either "optimized rust" or "pure python."

All data types (V, S, O etc) are now split off and i/o for them is done on an instantiated zarr-python pipeline (as opposed to our rust one).

Furthermore, default fill values have been added for all data types in python for our pipeline (as None is valid at hte top-level in zarr-python) as well as improved fill value handling in rust which should prepare us a bit better for variable-length decoding in pure-rust.

ilan-gold · 2025-01-30T14:01:06Z

tests/test_v2.py

@@ -0,0 +1,346 @@
+import json


This is completely taken from the zarr codebase with minimal changes

ilan-gold · 2025-01-30T14:02:11Z

tests/test_v2.py

+@pytest.mark.parametrize(
+    "array_order", ["C", pytest.param("F", marks=[pytest.mark.xfail])]
+)
+@pytest.mark.parametrize("data_order", ["C", "F"])
+@pytest.mark.parametrize(
+    "memory_order", ["C", pytest.param("F", marks=[pytest.mark.xfail])]
+)


Should we fall back to python for F-arrays or just fail?

We could handle F array order easily if zarr_python gave us all of the array metadata when constructing the codec pipeline

zarrs-python/src/metadata_v2.rs

Lines 36 to 39 in 26ee516

// FIXME: The array order, dimensionality, data type, and endianness are needed to exhaustively support all Zarr V2 data that zarrs can handle.

// However, CodecPipeline.from_codecs does not supply this information, and CodecPipeline.evolve_from_array_spec is seemingly never called.

let metadata = zarrs::metadata::v2_to_v3::codec_metadata_v2_to_v3(

ArrayMetadataV2Order::C,

python/zarrs/pipeline.py

LDeakin · 2025-02-03T21:51:16Z

I've added a few FIXMEs for variable-length data and fixed the codec handling. All the V2 codec metadata logic could just be pulled from zarrs itself instead though. ~~Just need to isolate the codec part of array_metadata_v2_to_v3~~ Done in LDeakin/zarrs#141.

There is a lot of additional logic already taken care of by `zarrs`, like handling multiple versions of codec metadata.

LDeakin · 2025-02-05T23:29:36Z

src/chunk_item.rs

+            fill_value_bytes = fill_value.call_method0("tobytes")?.extract()?;
+        } else if let Ok(fill_value_downcast) = fill_value.downcast::<PyInt>() {
+            let fill_value_usize: usize = fill_value_downcast.extract()?;
+            if fill_value_usize == (0 as usize) && dtype == "object" {


Turns out this varies between zarr-python versions, but it is at least consistent with zarr-python 3, See zarr-developers/zarr-python#2792 (comment)

Sorry meant to reply to this earlier - this condition is still kosher, though, right? I mean, the behavior is correct, that this is implicitly the v2 case that we have been discussing elsewhere and this sort of thing is not allowed in v3?

Just want to make sure I'm clear before merging since I feat I may be a bit lost in the sauce.

I think the "string" / 0 fill value workaround was unreachable before. I've adjusted it now to match zarr-python 2.x.x behaviour given the discussion in zarr-developers/zarr-python#2792

Oh the workaround can't be called anyway since we reject variable-sized data types from the pipeline. It'd be handy if zarr-python resolves zarr-developers/zarr-python#2792 so that we get a "0" instead of a 0 and no workaround is needed on our side.

It all makes sense now. zarr-developers/zarr-python#2792 (comment). Reverted back to the 0 -> "" fill value workaround.

LDeakin · 2025-02-06T08:09:50Z

In 26ee516 I switched to using all the V2 to V3 codec metadata handling in zarrs. This is more complicated than it looks, because there are so many variants of codec metadata that have been produced over the life of zarr-python / numcodecs. For example, I have Zarr V2/V3 specific handling of zstd for multiple numcodecs versions:

I hope in the future that this responsibility of mapping interim/experimental codec metadata to standardised metadata can be taken on by zarr-python instead. Maybe it already can? I haven't really looked.

ilan-gold · 2025-02-07T11:26:46Z

@LDeakin Thanks for this. I had looked into that function as well when writing this PR, but didn't want to ask you to make it public given the amount of APIs you already have.

I hope in the future that this responsibility of mapping interim/experimental codec metadata to standardised metadata can be taken on by zarr-python instead. Maybe it already can? I haven't really looked.

Right, so I had opened an issue in numcodecs kind of about this: zarr-developers/numcodecs#676 The reason we need to call get_config is because these V2 codecs haven't been standardized/given v3-compatible json interchange formats. I wanted to look into that after this PR and I understood more what was going on.

d-v-b · 2025-02-07T11:29:01Z

see zarr-developers/numcodecs#686. we need to fix this.

ilan-gold · 2025-02-07T12:23:30Z

Ah amazing @d-v-b I will close my little issue then. Thanks!

This reverts commit 6ff6c2b.

LDeakin and others added 6 commits January 4, 2025 09:45

chore(deps): bump zarr to 3.0.0rc1

c7fb95a

fmt

0a877e0

(feat): python fallack

def2e70

Merge branch 'main' into ig/python_fallback

622287d

(fix): dtypes

b362759

(fix): object dtypes + v2 tests

fba8226

ilan-gold commented Jan 30, 2025

View reviewed changes

tests/test_v2.py

@@ -0,0 +1,346 @@

import json

Copy link

Owner Author

ilan-gold Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is completely taken from the zarr codebase with minimal changes

ilan-gold commented Jan 30, 2025

View reviewed changes

ilan-gold added 5 commits January 30, 2025 15:02

(fix): object dtypes + v2 tests

4aa21a3

(fix): object dtypes + v2 tests

a51e810

(fix): object dtypes in rust

19e90e3

(fix): blosc support

4a59ec1

(refactor): handle None fill-value more gracefully

45efee1

ilan-gold commented Feb 3, 2025

View reviewed changes

python/zarrs/pipeline.py Outdated Show resolved Hide resolved

LDeakin added 2 commits February 4, 2025 08:25

fix: V2 codec pipeline creation

59e60fc

fix: zfpy/pcodec metadata handling

1a6dc77

ilan-gold added 7 commits February 4, 2025 11:21

(fix): fall back for unsupported codecs

008fd6a

(fix): our decode codec pipeline does not support vlen

9a0daa9

(fix): string dtype test to match zarr-python

4637d24

(chore): add note

cf2e6b5

(fix): ruff

00e73ed

(fix): rustfmt

d8aa2cc

(fix): pyi

8ea80bc

ilan-gold marked this pull request as ready for review February 4, 2025 12:41

ilan-gold requested a review from LDeakin February 4, 2025 12:43

ilan-gold and others added 3 commits February 4, 2025 16:32

(fix): try removing zarr main branch dep

db255a9

fix: use upstream implicit fill values

cb4bedc

fix: use upstream metadata handling

26ee516

There is a lot of additional logic already taken care of by `zarrs`, like handling multiple versions of codec metadata.

LDeakin approved these changes Feb 5, 2025

View reviewed changes

LDeakin added 5 commits February 8, 2025 09:09

fix: cleanup fill value handling for string dtype

6ff6c2b

Revert "fix: cleanup fill value handling for string dtype"

abe4dd5

This reverts commit 6ff6c2b.

fix: cleanup fill value handling for string dtype

a618605

fix: fmt and clippy warnings

4159751

fix: zarr-python 0 fill value handling

ae194a6

ilan-gold merged commit 1d4e3cb into main Feb 11, 2025
17 checks passed

ilan-gold deleted the ig/python_fallback branch February 11, 2025 12:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(feat): full v2 compat via python fallback #84

(feat): full v2 compat via python fallback #84

ilan-gold commented Jan 30, 2025 •

edited

Loading

ilan-gold Jan 30, 2025

ilan-gold Jan 30, 2025

LDeakin Feb 5, 2025

LDeakin commented Feb 3, 2025 •

edited

Loading

LDeakin Feb 5, 2025

ilan-gold Feb 7, 2025

LDeakin Feb 7, 2025

LDeakin Feb 7, 2025

LDeakin Feb 8, 2025

LDeakin commented Feb 6, 2025

ilan-gold commented Feb 7, 2025

d-v-b commented Feb 7, 2025

ilan-gold commented Feb 7, 2025

	// FIXME: The array order, dimensionality, data type, and endianness are needed to exhaustively support all Zarr V2 data that zarrs can handle.
	// However, CodecPipeline.from_codecs does not supply this information, and CodecPipeline.evolve_from_array_spec is seemingly never called.
	let metadata = zarrs::metadata::v2_to_v3::codec_metadata_v2_to_v3(
	ArrayMetadataV2Order::C,

(feat): full v2 compat via python fallback #84

(feat): full v2 compat via python fallback #84

Conversation

ilan-gold commented Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LDeakin commented Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LDeakin commented Feb 6, 2025

ilan-gold commented Feb 7, 2025

d-v-b commented Feb 7, 2025

ilan-gold commented Feb 7, 2025

ilan-gold commented Jan 30, 2025 •

edited

Loading

LDeakin commented Feb 3, 2025 •

edited

Loading