-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why are da.chunks
and ds.chunks
properties inconsistent?
#5843
Comments
There is another difference between |
The honest answer is that I didn't think too carefully about this when originally implementing Xarray's Dask wrapper back in 2015.
|
I guessed that might be the case!
Still leaves this question though ^ . I made a draft PR in #5846. |
For DataArrays there is an underlying It seems better to introduce a new property on both DataArrays and Datasets that always returns a dict (Like There is a similar problem for |
That's a good suggestion - then we can have backwards compatibility whilst also allowing intuitive code that treats dataarrays and datasets similarly, e.g: def is_core_dim_chunked(obj, core_dim):
return len(obj.chunksizes[core_dim]) > 1
I think |
Agree! Now we just need to decide between |
Basically the title, but what I'm referring to is this:
Why does
DataArray.chunks
return a tuple andDataset.chunks
return a frozen dictionary?This seems a bit silly, for a few reasons:
it means that some perfectly reasonable code might fail unnecessarily if passed a DataArray instead of a Dataset or vice versa, such as
which will work as intended for a dataset but raises a
TypeError
for a dataarray.it breaks the pattern we use for
.sizes
, whereif you want the chunks as a tuple they are always accessible via
da.data.chunks
, which is a more sensible place to look to find the chunks without dimension names.It's an undocumented difference, as the docstrings for
ds.chunks
andda.chunks
both only say"""Block dimensions for this dataset’s data or None if it’s not a dask array."""
which doesn't tell me anything about the return type, or warn me that the return types are different.
EDIT: In fact
DataArray.chunk
doesn't even appear to be listed on the API docs page at all.In our codebase this difference is mostly washed out by us using
._to_temp_dataset()
all the time, and also by the way that the.chunk()
method accepts both the tuple and dict form, so both of these invariants hold (but in different ways):I'm not sure whether making this consistent is worth the effort of a significant breaking change though 😕
(Sort of related to #2103)
The text was updated successfully, but these errors were encountered: