Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing behaviour for group parameter in open_datatree #9666

Merged
merged 26 commits into from
Oct 24, 2024

Conversation

aladinor
Copy link
Contributor

@aladinor aladinor commented Oct 23, 2024

Hi all.

This might be more complex than pruning the path in the open_group_as_dict function. It is kind of complex because when we use _iter_zarr_groups, it yields the root group. I am still working o it

@aladinor
Copy link
Contributor Author

aladinor commented Oct 23, 2024

@TomNicholas, I think this solution will work, but the DataTree.from_dict(groups_dict) function will create the root as an empty node. I think we might need to check it out. Let me know your thoughts.

@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Oct 23, 2024
Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason why you don't get the desired result is that you compute paths relative to the immediate parent of the group, not the global parent. I don't have a deeply nested tree ready for testing (so I can't be sure this actually works), but with the suggestion below I don't get the empty root node anymore.

xarray/backends/zarr.py Outdated Show resolved Hide resolved
xarray/backends/zarr.py Outdated Show resolved Hide resolved
@aladinor
Copy link
Contributor Author

I think we are getting close. However, we are still having some discrepancies when comparing both datatrees using the group parameter and when directly selecting it via path. Example:

print(dtree["/group2/subg1"])
Group: /group2/subg1 <----- different root paths hereDimensions:  (x: 2, y: 3)
│   Inherited coordinates:
│     * x        (x) int64 16B -1 -2* y        (y) int64 24B 0 1 2Data variables:
│       blah     (x) int64 16B 2 3
├── Group: /group2/subg1/subsub1 <----- different paths hereDimensions:  (y: 3)
│       Data variables:
│           var      (y) int64 24B 4 5 6
└── Group: /group2/subg1/subsub2 <----- different  paths here

dt2 = xr.open_datatree("test.zarr",
                          group="/group2/subg1"

print(dt2)
Group: /  <----- different root paths hereDimensions:  (x: 2)
│   Dimensions without coordinates: xData variables:
│       blah     (x) int64 16B ...
├── Group: /subsub1 <----- different  paths hereDimensions:  (y: 3)
│       Dimensions without coordinates: yData variables:
│           var      (y) int64 24B ...
└── Group: /subsub2 <----- different  paths here

Any comments on this @TomNicholas @keewis?

@keewis
Copy link
Collaborator

keewis commented Oct 24, 2024

this is by design, I think? I'd interpret group="/group2/subgroup1" as saying "give me that group as the new root group". Then tree.encoding["source_group"] can contain the full path of the new root.

(if you know unix commands, this would be similar to chroot)

xarray/backends/zarr.py Outdated Show resolved Hide resolved
@TomNicholas
Copy link
Member

discrepancies when comparing both datatrees using the group parameter and when directly selecting it via path

The reprs are different because the objects are different: dtree["/group2/subg1"] has a parent, whereas dt2 does not. So this is intended.

@aladinor aladinor marked this pull request as ready for review October 24, 2024 16:47
Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need a test, to remove the encoding bit, and then this is good to go!

xarray/backends/h5netcdf_.py Outdated Show resolved Hide resolved
xarray/backends/netCDF4_.py Outdated Show resolved Hide resolved
@TomNicholas
Copy link
Member

The test should look very much like the ones in #9669 - create a tiny nested tree, save it to zarr/netcdf, open it with the group parameter, and check the structure is as expected.

@aladinor aladinor changed the title adding draft for fixing behaviour for group parameter fixing behaviour for group parameter in open_datatree Oct 24, 2024
Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @aladinor !

@TomNicholas TomNicholas requested a review from keewis October 24, 2024 18:56
Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got two comments: one on our strategy on DataTree whats-new entries, and one on the way we compare node datasets.

doc/whats-new.rst Outdated Show resolved Hide resolved
xarray/tests/test_backends_datatree.py Outdated Show resolved Hide resolved
xarray/tests/test_backends_datatree.py Outdated Show resolved Hide resolved
xarray/tests/test_backends_datatree.py Outdated Show resolved Hide resolved
@TomNicholas
Copy link
Member

Sorry @aladinor - in fact could we just do this? #9666 (comment)

@TomNicholas
Copy link
Member

This looks good! @keewis 's comments are addressed so I'm going to merge it.

@keewis
Copy link
Collaborator

keewis commented Oct 24, 2024

(you can still add yourself to the list of contributors to the DataTree entry in the whats-new, @aladinor Oh, and me too, apparently 😅)

Edit: see

xarray/doc/whats-new.rst

Lines 24 to 32 in 5b2e6f1

- ``DataTree`` related functionality is now exposed in the main ``xarray`` public
API. This includes: ``xarray.DataTree``, ``xarray.open_datatree``, ``xarray.open_groups``,
``xarray.map_over_datasets``, ``xarray.group_subtrees``,
``xarray.register_datatree_accessor`` and ``xarray.testing.assert_isomorphic``.
By `Owen Littlejohns <https://github.com/owenlittlejohns>`_,
`Eni Awowale <https://github.com/eni-awowale>`_,
`Matt Savoie <https://github.com/flamingbear>`_,
`Stephan Hoyer <https://github.com/shoyer>`_ and
`Tom Nicholas <https://github.com/TomNicholas>`_.

@keewis
Copy link
Collaborator

keewis commented Oct 24, 2024

Otherwise @TomNicholas can do that while preparing for the release.

@TomNicholas
Copy link
Member

Amazing thank you! And thanks for pointing that out so everyone involved can get credit for these great contributions!

@TomNicholas TomNicholas enabled auto-merge (squash) October 24, 2024 20:28
@aladinor
Copy link
Contributor Author

Thanks @TomNicholas and @keewis for your guidance!

@TomNicholas TomNicholas disabled auto-merge October 24, 2024 21:00
@TomNicholas TomNicholas merged commit f24cae3 into pydata:main Oct 24, 2024
28 of 29 checks passed
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 29, 2024
* main:
  Add `DataTree.persist` (pydata#9682)
  Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688)
  Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689)
  Fix inadvertent deep-copying of child data in DataTree (pydata#9684)
  new blank whatsnew (pydata#9679)
  v2024.10.0 release summary (pydata#9678)
  drop the length from `numpy`'s fixed-width string dtypes (pydata#9586)
  fixing behaviour for group parameter in `open_datatree` (pydata#9666)
  Use zarr v3 dimension_names (pydata#9669)
  fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673)
  implement `dask` methods on `DataTree` (pydata#9670)
  support `chunks` in `open_groups` and `open_datatree` (pydata#9660)
  Compatibility for zarr-python 3.x (pydata#9552)
  Update to_dataframe doc to match current behavior (pydata#9662)
  Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)
dcherian added a commit to dcherian/xarray that referenced this pull request Nov 3, 2024
* main: (85 commits)
  Refactor out utility functions from to_zarr (pydata#9695)
  Use the same function to floatize coords in polyfit and polyval (pydata#9691)
  Add `DataTree.persist` (pydata#9682)
  Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688)
  Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689)
  Fix inadvertent deep-copying of child data in DataTree (pydata#9684)
  new blank whatsnew (pydata#9679)
  v2024.10.0 release summary (pydata#9678)
  drop the length from `numpy`'s fixed-width string dtypes (pydata#9586)
  fixing behaviour for group parameter in `open_datatree` (pydata#9666)
  Use zarr v3 dimension_names (pydata#9669)
  fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673)
  implement `dask` methods on `DataTree` (pydata#9670)
  support `chunks` in `open_groups` and `open_datatree` (pydata#9660)
  Compatibility for zarr-python 3.x (pydata#9552)
  Update to_dataframe doc to match current behavior (pydata#9662)
  Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)
  Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651)
  Change URL for pydap test (pydata#9655)
  Fix multiple grouping with missing groups (pydata#9650)
  ...
@aladinor aladinor deleted the fix-group-param branch November 20, 2024 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-backends topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging this pull request may close these issues.

open_datatree(group='some_subgroup') returning parent nodes
3 participants