fixing behaviour for group parameter in `open_datatree` #9666

aladinor · 2024-10-23T19:53:25Z

Hi all.

This might be more complex than pruning the path in the open_group_as_dict function. It is kind of complex because when we use _iter_zarr_groups, it yields the root group. I am still working o it

Closes open_datatree(group='some_subgroup') returning parent nodes #9665
Tests added

for more information, see https://pre-commit.ci

aladinor · 2024-10-23T20:46:28Z

@TomNicholas, I think this solution will work, but the DataTree.from_dict(groups_dict) function will create the root as an empty node. I think we might need to check it out. Let me know your thoughts.

keewis

I think the reason why you don't get the desired result is that you compute paths relative to the immediate parent of the group, not the global parent. I don't have a deeply nested tree ready for testing (so I can't be sure this actually works), but with the suggestion below I don't get the empty root node anymore.

xarray/backends/zarr.py

aladinor · 2024-10-24T15:37:17Z

I think we are getting close. However, we are still having some discrepancies when comparing both datatrees using the group parameter and when directly selecting it via path. Example:

print(dtree["/group2/subg1"])
Group: /group2/subg1 <----- different root paths here
│   Dimensions:  (x: 2, y: 3)
│   Inherited coordinates:
│     * x        (x) int64 16B -1 -2
│     * y        (y) int64 24B 0 1 2
│   Data variables:
│       blah     (x) int64 16B 2 3
├── Group: /group2/subg1/subsub1 <----- different paths here
│       Dimensions:  (y: 3)
│       Data variables:
│           var      (y) int64 24B 4 5 6
└── Group: /group2/subg1/subsub2 <----- different  paths here

dt2 = xr.open_datatree("test.zarr",
                          group="/group2/subg1"

print(dt2)
Group: /  <----- different root paths here
│   Dimensions:  (x: 2)
│   Dimensions without coordinates: x
│   Data variables:
│       blah     (x) int64 16B ...
├── Group: /subsub1 <----- different  paths here
│       Dimensions:  (y: 3)
│       Dimensions without coordinates: y
│       Data variables:
│           var      (y) int64 24B ...
└── Group: /subsub2 <----- different  paths here

Any comments on this @TomNicholas @keewis?

keewis · 2024-10-24T15:45:41Z

this is by design, I think? I'd interpret group="/group2/subgroup1" as saying "give me that group as the new root group". Then tree.encoding["source_group"] can contain the full path of the new root.

(if you know unix commands, this would be similar to chroot)

xarray/backends/zarr.py

TomNicholas · 2024-10-24T16:46:37Z

discrepancies when comparing both datatrees using the group parameter and when directly selecting it via path

The reprs are different because the objects are different: dtree["/group2/subg1"] has a parent, whereas dt2 does not. So this is intended.

TomNicholas

We just need a test, to remove the encoding bit, and then this is good to go!

xarray/backends/h5netcdf_.py

xarray/backends/netCDF4_.py

TomNicholas · 2024-10-24T16:52:58Z

The test should look very much like the ones in #9669 - create a tiny nested tree, save it to zarr/netcdf, open it with the group parameter, and check the structure is as expected.

…ata#9660

xarray/tests/test_backends_datatree.py

…into fix-group-param

for more information, see https://pre-commit.ci

…into fix-group-param

for more information, see https://pre-commit.ci

TomNicholas

Thank you so much @aladinor !

keewis

I've got two comments: one on our strategy on DataTree whats-new entries, and one on the way we compare node datasets.

doc/whats-new.rst

xarray/tests/test_backends_datatree.py

Co-authored-by: Justus Magin <[email protected]>

TomNicholas · 2024-10-24T19:24:34Z

Sorry @aladinor - in fact could we just do this? #9666 (comment)

TomNicholas · 2024-10-24T20:16:15Z

This looks good! @keewis 's comments are addressed so I'm going to merge it.

keewis · 2024-10-24T20:20:15Z

(you can still add yourself to the list of contributors to the DataTree entry in the whats-new, @aladinor Oh, and me too, apparently 😅)

Edit: see

xarray/doc/whats-new.rst

Lines 24 to 32 in 5b2e6f1

    
           - ``DataTree`` related functionality is now exposed in the main ``xarray`` public 
        
             API. This includes: ``xarray.DataTree``, ``xarray.open_datatree``, ``xarray.open_groups``, 
        
             ``xarray.map_over_datasets``, ``xarray.group_subtrees``, 
        
             ``xarray.register_datatree_accessor`` and ``xarray.testing.assert_isomorphic``. 
        
             By `Owen Littlejohns <https://github.com/owenlittlejohns>`_, 
        
             `Eni Awowale <https://github.com/eni-awowale>`_, 
        
             `Matt Savoie <https://github.com/flamingbear>`_, 
        
             `Stephan Hoyer <https://github.com/shoyer>`_ and 
        
             `Tom Nicholas <https://github.com/TomNicholas>`_.

keewis · 2024-10-24T20:22:29Z

Otherwise @TomNicholas can do that while preparing for the release.

… entry

TomNicholas · 2024-10-24T20:28:37Z

Amazing thank you! And thanks for pointing that out so everyone involved can get credit for these great contributions!

aladinor · 2024-10-24T20:57:36Z

Thanks @TomNicholas and @keewis for your guidance!

* main: Add `DataTree.persist` (pydata#9682) Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688) Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689) Fix inadvertent deep-copying of child data in DataTree (pydata#9684) new blank whatsnew (pydata#9679) v2024.10.0 release summary (pydata#9678) drop the length from `numpy`'s fixed-width string dtypes (pydata#9586) fixing behaviour for group parameter in `open_datatree` (pydata#9666) Use zarr v3 dimension_names (pydata#9669) fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673) implement `dask` methods on `DataTree` (pydata#9670) support `chunks` in `open_groups` and `open_datatree` (pydata#9660) Compatibility for zarr-python 3.x (pydata#9552) Update to_dataframe doc to match current behavior (pydata#9662) Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)

* main: (85 commits) Refactor out utility functions from to_zarr (pydata#9695) Use the same function to floatize coords in polyfit and polyval (pydata#9691) Add `DataTree.persist` (pydata#9682) Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688) Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689) Fix inadvertent deep-copying of child data in DataTree (pydata#9684) new blank whatsnew (pydata#9679) v2024.10.0 release summary (pydata#9678) drop the length from `numpy`'s fixed-width string dtypes (pydata#9586) fixing behaviour for group parameter in `open_datatree` (pydata#9666) Use zarr v3 dimension_names (pydata#9669) fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673) implement `dask` methods on `DataTree` (pydata#9670) support `chunks` in `open_groups` and `open_datatree` (pydata#9660) Compatibility for zarr-python 3.x (pydata#9552) Update to_dataframe doc to match current behavior (pydata#9662) Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658) Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651) Change URL for pydap test (pydata#9655) Fix multiple grouping with missing groups (pydata#9650) ...

aladinor and others added 5 commits October 23, 2024 14:39

adding draft for fixing behaviour for group parameter

3429b2c

[pre-commit.ci] auto fixes from pre-commit.com hooks

1507f4d

for more information, see https://pre-commit.ci

new trial

e24e88b

new trial

34e74db

new trial

b6fac5b

TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Oct 23, 2024

keewis reviewed Oct 23, 2024

View reviewed changes

xarray/backends/zarr.py Outdated Show resolved Hide resolved

xarray/backends/zarr.py Outdated Show resolved Hide resolved

keewis mentioned this pull request Oct 24, 2024

support chunks in open_groups and open_datatree #9660

Merged

2 tasks

fixing duplicate pahts and path in the root group

ce83c89

aladinor commented Oct 24, 2024

View reviewed changes

xarray/backends/zarr.py Outdated Show resolved Hide resolved

aladinor added 2 commits October 24, 2024 11:13

removing yield str(gpath)

72fcee6

implementing the proposed solution to hdf5 and netcdf backends

bd853c8

aladinor marked this pull request as ready for review October 24, 2024 16:47

TomNicholas reviewed Oct 24, 2024

View reviewed changes

xarray/backends/h5netcdf_.py Outdated Show resolved Hide resolved

xarray/backends/netCDF4_.py Outdated Show resolved Hide resolved

aladinor added 2 commits October 24, 2024 11:55

adding changes to whats-new.rst

d6e5422

removing encoding['source_group'] line to avoid conflicts with PR pyd…

12005e2

…ata#9660

aladinor changed the title ~~adding draft for fixing behaviour for group parameter~~ fixing behaviour for group parameter in open_datatree Oct 24, 2024

TomNicholas added the topic-backends label Oct 24, 2024

TomNicholas mentioned this pull request Oct 24, 2024

Track merging datatree into xarray #8572

Closed

27 tasks

adding test

e4384d6

TomNicholas reviewed Oct 24, 2024

View reviewed changes

xarray/tests/test_backends_datatree.py Outdated Show resolved Hide resolved

TomNicholas and others added 4 commits October 24, 2024 12:37

Merge branch 'main' into fix-group-param

0fab3c7

adding test

e935e4e

Merge branch 'fix-group-param' of https://github.com/aladinor/xarray …

2803f9f

…into fix-group-param

[pre-commit.ci] auto fixes from pre-commit.com hooks

9a41b68

for more information, see https://pre-commit.ci

aladinor and others added 4 commits October 24, 2024 13:40

adding assert subgroup_tree.root.parent is None

f5d3073

Merge branch 'fix-group-param' of https://github.com/aladinor/xarray …

a473778

…into fix-group-param

modifying tests

bb6d413

[pre-commit.ci] auto fixes from pre-commit.com hooks

fcf3dc6

for more information, see https://pre-commit.ci

TomNicholas approved these changes Oct 24, 2024

View reviewed changes

TomNicholas requested a review from keewis October 24, 2024 18:56

keewis reviewed Oct 24, 2024

View reviewed changes

doc/whats-new.rst Outdated Show resolved Hide resolved

xarray/tests/test_backends_datatree.py Outdated Show resolved Hide resolved

xarray/tests/test_backends_datatree.py Outdated Show resolved Hide resolved

xarray/tests/test_backends_datatree.py Outdated Show resolved Hide resolved

aladinor and others added 2 commits October 24, 2024 14:18

Update xarray/tests/test_backends_datatree.py

195e036

Co-authored-by: Justus Magin <[email protected]>

applying suggested changes

5ef3a56

aladinor and others added 2 commits October 24, 2024 14:49

updating test

90c5b4d

Merge branch 'main' into fix-group-param

38548b0

aladinor added 2 commits October 24, 2024 15:26

adding Justus and Alfonso to the list of contributors to the DataTree…

e78d576

… entry

adding Justus and Alfonso to the list of contributors to the DataTree…

762587b

… entry

TomNicholas enabled auto-merge (squash) October 24, 2024 20:28

Merge branch 'main' into fix-group-param

0cd22c5

TomNicholas mentioned this pull request Oct 24, 2024

Should DataTree.orphan act in-place? #9674

Open

TomNicholas disabled auto-merge October 24, 2024 21:00

TomNicholas merged commit f24cae3 into pydata:main Oct 24, 2024
28 of 29 checks passed

aladinor deleted the fix-group-param branch November 20, 2024 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixing behaviour for group parameter in `open_datatree` #9666

fixing behaviour for group parameter in `open_datatree` #9666

aladinor commented Oct 23, 2024 •

edited

Loading

aladinor commented Oct 23, 2024 •

edited

Loading

keewis left a comment •

edited

Loading

aladinor commented Oct 24, 2024

keewis commented Oct 24, 2024

TomNicholas commented Oct 24, 2024

TomNicholas left a comment

TomNicholas commented Oct 24, 2024

TomNicholas left a comment

keewis left a comment

TomNicholas commented Oct 24, 2024

TomNicholas commented Oct 24, 2024

keewis commented Oct 24, 2024 •

edited

Loading

keewis commented Oct 24, 2024

TomNicholas commented Oct 24, 2024

aladinor commented Oct 24, 2024

fixing behaviour for group parameter in open_datatree #9666

fixing behaviour for group parameter in open_datatree #9666

Conversation

aladinor commented Oct 23, 2024 • edited Loading

aladinor commented Oct 23, 2024 • edited Loading

keewis left a comment • edited Loading

Choose a reason for hiding this comment

aladinor commented Oct 24, 2024

keewis commented Oct 24, 2024

TomNicholas commented Oct 24, 2024

TomNicholas left a comment

Choose a reason for hiding this comment

TomNicholas commented Oct 24, 2024

TomNicholas left a comment

Choose a reason for hiding this comment

keewis left a comment

Choose a reason for hiding this comment

TomNicholas commented Oct 24, 2024

TomNicholas commented Oct 24, 2024

keewis commented Oct 24, 2024 • edited Loading

keewis commented Oct 24, 2024

TomNicholas commented Oct 24, 2024

aladinor commented Oct 24, 2024

fixing behaviour for group parameter in `open_datatree` #9666

fixing behaviour for group parameter in `open_datatree` #9666

aladinor commented Oct 23, 2024 •

edited

Loading

aladinor commented Oct 23, 2024 •

edited

Loading

keewis left a comment •

edited

Loading

keewis commented Oct 24, 2024 •

edited

Loading