-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(feat): full v2 compat via python fallback #84
Conversation
@@ -0,0 +1,346 @@ | |||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is completely taken from the zarr codebase with minimal changes
@pytest.mark.parametrize( | ||
"array_order", ["C", pytest.param("F", marks=[pytest.mark.xfail])] | ||
) | ||
@pytest.mark.parametrize("data_order", ["C", "F"]) | ||
@pytest.mark.parametrize( | ||
"memory_order", ["C", pytest.param("F", marks=[pytest.mark.xfail])] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we fall back to python for F-arrays or just fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could handle F
array order easily if zarr_python
gave us all of the array metadata when constructing the codec pipeline
zarrs-python/src/metadata_v2.rs
Lines 36 to 39 in 26ee516
// FIXME: The array order, dimensionality, data type, and endianness are needed to exhaustively support all Zarr V2 data that zarrs can handle. | |
// However, CodecPipeline.from_codecs does not supply this information, and CodecPipeline.evolve_from_array_spec is seemingly never called. | |
let metadata = zarrs::metadata::v2_to_v3::codec_metadata_v2_to_v3( | |
ArrayMetadataV2Order::C, |
I've added a few FIXMEs for variable-length data and fixed the codec handling. All the V2 codec metadata logic could just be pulled from |
There is a lot of additional logic already taken care of by `zarrs`, like handling multiple versions of codec metadata.
src/chunk_item.rs
Outdated
fill_value_bytes = fill_value.call_method0("tobytes")?.extract()?; | ||
} else if let Ok(fill_value_downcast) = fill_value.downcast::<PyInt>() { | ||
let fill_value_usize: usize = fill_value_downcast.extract()?; | ||
if fill_value_usize == (0 as usize) && dtype == "object" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out this varies between zarr-python versions, but it is at least consistent with zarr-python
3, See zarr-developers/zarr-python#2792 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry meant to reply to this earlier - this condition is still kosher, though, right? I mean, the behavior is correct, that this is implicitly the v2 case that we have been discussing elsewhere and this sort of thing is not allowed in v3?
Just want to make sure I'm clear before merging since I feat I may be a bit lost in the sauce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the "string" / 0 fill value workaround was unreachable before. I've adjusted it now to match zarr-python
2.x.x behaviour given the discussion in zarr-developers/zarr-python#2792
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh the workaround can't be called anyway since we reject variable-sized data types from the pipeline. It'd be handy if zarr-python
resolves zarr-developers/zarr-python#2792 so that we get a "0" instead of a 0 and no workaround is needed on our side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It all makes sense now. zarr-developers/zarr-python#2792 (comment). Reverted back to the 0
-> ""
fill value workaround.
In 26ee516 I switched to using all the V2 to V3 codec metadata handling in
I hope in the future that this responsibility of mapping interim/experimental codec metadata to standardised metadata can be taken on by |
@LDeakin Thanks for this. I had looked into that function as well when writing this PR, but didn't want to ask you to make it public given the amount of APIs you already have.
Right, so I had opened an issue in |
see zarr-developers/numcodecs#686. we need to fix this. |
Ah amazing @d-v-b I will close my little issue then. Thanks! |
Just a draft but moving in the direction of full v2 support by falling back to zarr-python and also fixing some serialization issues. I think this mechanism, even if it is obviated by development on the rust side, is not bad to have as it allows us to keep up a bit more with new features/bugs etc.I got rid of the full-chunk fetching (i.e., for some simple integer indexing cases) in rust because it was actually slower than just doing it in python. This also lets us be a bit more clear about what we support and don't because it is simpler now as either "optimized rust" or "pure python."All data types (
V
,S
,O
etc) are now split off and i/o for them is done on an instantiated zarr-python pipeline (as opposed to our rust one).Furthermore, default fill values have been added for all data types in python for our pipeline (as
None
is valid at hte top-level inzarr-python
) as well as improved fill value handling inrust
which should prepare us a bit better for variable-length decoding in pure-rust.