-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixing behaviour for group parameter in open_datatree
#9666
Conversation
@TomNicholas, I think this solution will work, but the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason why you don't get the desired result is that you compute paths relative to the immediate parent of the group, not the global parent. I don't have a deeply nested tree ready for testing (so I can't be sure this actually works), but with the suggestion below I don't get the empty root node anymore.
I think we are getting close. However, we are still having some discrepancies when comparing both datatrees using the group parameter and when directly selecting it via path. Example: print(dtree["/group2/subg1"])
Group: /group2/subg1 <----- different root paths here
│ Dimensions: (x: 2, y: 3)
│ Inherited coordinates:
│ * x (x) int64 16B -1 -2
│ * y (y) int64 24B 0 1 2
│ Data variables:
│ blah (x) int64 16B 2 3
├── Group: /group2/subg1/subsub1 <----- different paths here
│ Dimensions: (y: 3)
│ Data variables:
│ var (y) int64 24B 4 5 6
└── Group: /group2/subg1/subsub2 <----- different paths here
dt2 = xr.open_datatree("test.zarr",
group="/group2/subg1"
print(dt2)
Group: / <----- different root paths here
│ Dimensions: (x: 2)
│ Dimensions without coordinates: x
│ Data variables:
│ blah (x) int64 16B ...
├── Group: /subsub1 <----- different paths here
│ Dimensions: (y: 3)
│ Dimensions without coordinates: y
│ Data variables:
│ var (y) int64 24B ...
└── Group: /subsub2 <----- different paths here Any comments on this @TomNicholas @keewis? |
this is by design, I think? I'd interpret (if you know unix commands, this would be similar to |
The reprs are different because the objects are different: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just need a test, to remove the encoding bit, and then this is good to go!
The test should look very much like the ones in #9669 - create a tiny nested tree, save it to zarr/netcdf, open it with the |
open_datatree
…into fix-group-param
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @aladinor !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got two comments: one on our strategy on DataTree
whats-new entries, and one on the way we compare node datasets.
Co-authored-by: Justus Magin <[email protected]>
Sorry @aladinor - in fact could we just do this? #9666 (comment) |
This looks good! @keewis 's comments are addressed so I'm going to merge it. |
(you can still add yourself to the list of contributors to the DataTree entry in the whats-new, @aladinor Oh, and me too, apparently 😅) Edit: see Lines 24 to 32 in 5b2e6f1
|
Otherwise @TomNicholas can do that while preparing for the release. |
Amazing thank you! And thanks for pointing that out so everyone involved can get credit for these great contributions! |
Thanks @TomNicholas and @keewis for your guidance! |
* main: Add `DataTree.persist` (pydata#9682) Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688) Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689) Fix inadvertent deep-copying of child data in DataTree (pydata#9684) new blank whatsnew (pydata#9679) v2024.10.0 release summary (pydata#9678) drop the length from `numpy`'s fixed-width string dtypes (pydata#9586) fixing behaviour for group parameter in `open_datatree` (pydata#9666) Use zarr v3 dimension_names (pydata#9669) fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673) implement `dask` methods on `DataTree` (pydata#9670) support `chunks` in `open_groups` and `open_datatree` (pydata#9660) Compatibility for zarr-python 3.x (pydata#9552) Update to_dataframe doc to match current behavior (pydata#9662) Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)
* main: (85 commits) Refactor out utility functions from to_zarr (pydata#9695) Use the same function to floatize coords in polyfit and polyval (pydata#9691) Add `DataTree.persist` (pydata#9682) Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688) Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689) Fix inadvertent deep-copying of child data in DataTree (pydata#9684) new blank whatsnew (pydata#9679) v2024.10.0 release summary (pydata#9678) drop the length from `numpy`'s fixed-width string dtypes (pydata#9586) fixing behaviour for group parameter in `open_datatree` (pydata#9666) Use zarr v3 dimension_names (pydata#9669) fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673) implement `dask` methods on `DataTree` (pydata#9670) support `chunks` in `open_groups` and `open_datatree` (pydata#9660) Compatibility for zarr-python 3.x (pydata#9552) Update to_dataframe doc to match current behavior (pydata#9662) Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658) Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651) Change URL for pydap test (pydata#9655) Fix multiple grouping with missing groups (pydata#9650) ...
Hi all.
This might be more complex than pruning the path in the open_group_as_dict function. It is kind of complex because when we use
_iter_zarr_groups,
it yields the root group. I am still working o it