Skip to content
forked from pydata/xarray

Commit

Permalink
Merge branch 'main' into groupby-shuffle
Browse files Browse the repository at this point in the history
* main: (85 commits)
  Refactor out utility functions from to_zarr (pydata#9695)
  Use the same function to floatize coords in polyfit and polyval (pydata#9691)
  Add `DataTree.persist` (pydata#9682)
  Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688)
  Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689)
  Fix inadvertent deep-copying of child data in DataTree (pydata#9684)
  new blank whatsnew (pydata#9679)
  v2024.10.0 release summary (pydata#9678)
  drop the length from `numpy`'s fixed-width string dtypes (pydata#9586)
  fixing behaviour for group parameter in `open_datatree` (pydata#9666)
  Use zarr v3 dimension_names (pydata#9669)
  fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673)
  implement `dask` methods on `DataTree` (pydata#9670)
  support `chunks` in `open_groups` and `open_datatree` (pydata#9660)
  Compatibility for zarr-python 3.x (pydata#9552)
  Update to_dataframe doc to match current behavior (pydata#9662)
  Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)
  Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651)
  Change URL for pydap test (pydata#9655)
  Fix multiple grouping with missing groups (pydata#9650)
  ...
  • Loading branch information
dcherian committed Nov 1, 2024
2 parents 91e4bd8 + 7467b1e commit 0542944
Show file tree
Hide file tree
Showing 130 changed files with 8,172 additions and 2,971 deletions.
2 changes: 1 addition & 1 deletion .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
github: numfocus
custom: http://numfocus.org/donate-to-xarray
custom: https://numfocus.org/donate-to-xarray
2 changes: 1 addition & 1 deletion .github/workflows/benchmarks-last-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
fetch-depth: 0

- name: Set up conda environment
uses: mamba-org/setup-micromamba@v1
uses: mamba-org/setup-micromamba@v2
with:
environment-file: ${{env.CONDA_ENV_FILE}}
environment-name: xarray-tests
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/benchmarks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
fetch-depth: 0

- name: Set up conda environment
uses: mamba-org/setup-micromamba@v1
uses: mamba-org/setup-micromamba@v2
with:
environment-file: ${{env.CONDA_ENV_FILE}}
environment-name: xarray-tests
Expand Down
26 changes: 13 additions & 13 deletions .github/workflows/ci-additional.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ jobs:
echo "TODAY=$(date +'%Y-%m-%d')" >> $GITHUB_ENV
- name: Setup micromamba
uses: mamba-org/setup-micromamba@v1
uses: mamba-org/setup-micromamba@v2
with:
environment-file: ${{env.CONDA_ENV_FILE}}
environment-name: xarray-tests
Expand Down Expand Up @@ -92,7 +92,7 @@ jobs:
shell: bash -l {0}
env:
CONDA_ENV_FILE: ci/requirements/environment.yml
PYTHON_VERSION: "3.11"
PYTHON_VERSION: "3.12"

steps:
- uses: actions/checkout@v4
Expand All @@ -103,7 +103,7 @@ jobs:
run: |
echo "TODAY=$(date +'%Y-%m-%d')" >> $GITHUB_ENV
- name: Setup micromamba
uses: mamba-org/setup-micromamba@v1
uses: mamba-org/setup-micromamba@v2
with:
environment-file: ${{env.CONDA_ENV_FILE}}
environment-name: xarray-tests
Expand All @@ -122,14 +122,14 @@ jobs:
python xarray/util/print_versions.py
- name: Install mypy
run: |
python -m pip install "mypy" --force-reinstall
python -m pip install "mypy==1.11.2" --force-reinstall
- name: Run mypy
run: |
python -m mypy --install-types --non-interactive --cobertura-xml-report mypy_report
- name: Upload mypy coverage to Codecov
uses: codecov/codecov-action@v4.5.0
uses: codecov/codecov-action@v4.6.0
with:
file: mypy_report/cobertura.xml
flags: mypy
Expand Down Expand Up @@ -157,7 +157,7 @@ jobs:
run: |
echo "TODAY=$(date +'%Y-%m-%d')" >> $GITHUB_ENV
- name: Setup micromamba
uses: mamba-org/setup-micromamba@v1
uses: mamba-org/setup-micromamba@v2
with:
environment-file: ${{env.CONDA_ENV_FILE}}
environment-name: xarray-tests
Expand All @@ -176,14 +176,14 @@ jobs:
python xarray/util/print_versions.py
- name: Install mypy
run: |
python -m pip install "mypy" --force-reinstall
python -m pip install "mypy==1.11.2" --force-reinstall
- name: Run mypy
run: |
python -m mypy --install-types --non-interactive --cobertura-xml-report mypy_report
- name: Upload mypy coverage to Codecov
uses: codecov/codecov-action@v4.5.0
uses: codecov/codecov-action@v4.6.0
with:
file: mypy_report/cobertura.xml
flags: mypy-min
Expand Down Expand Up @@ -216,7 +216,7 @@ jobs:
run: |
echo "TODAY=$(date +'%Y-%m-%d')" >> $GITHUB_ENV
- name: Setup micromamba
uses: mamba-org/setup-micromamba@v1
uses: mamba-org/setup-micromamba@v2
with:
environment-file: ${{env.CONDA_ENV_FILE}}
environment-name: xarray-tests
Expand All @@ -242,7 +242,7 @@ jobs:
python -m pyright xarray/
- name: Upload pyright coverage to Codecov
uses: codecov/codecov-action@v4.5.0
uses: codecov/codecov-action@v4.6.0
with:
file: pyright_report/cobertura.xml
flags: pyright
Expand Down Expand Up @@ -275,7 +275,7 @@ jobs:
run: |
echo "TODAY=$(date +'%Y-%m-%d')" >> $GITHUB_ENV
- name: Setup micromamba
uses: mamba-org/setup-micromamba@v1
uses: mamba-org/setup-micromamba@v2
with:
environment-file: ${{env.CONDA_ENV_FILE}}
environment-name: xarray-tests
Expand All @@ -301,7 +301,7 @@ jobs:
python -m pyright xarray/
- name: Upload pyright coverage to Codecov
uses: codecov/codecov-action@v4.5.0
uses: codecov/codecov-action@v4.6.0
with:
file: pyright_report/cobertura.xml
flags: pyright39
Expand All @@ -324,7 +324,7 @@ jobs:
fetch-depth: 0 # Fetch all history for all branches and tags.

- name: Setup micromamba
uses: mamba-org/setup-micromamba@v1
uses: mamba-org/setup-micromamba@v2
with:
environment-name: xarray-tests
create-args: >-
Expand Down
7 changes: 5 additions & 2 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,9 @@ jobs:
python-version: "3.10"
os: ubuntu-latest
# Latest python version:
- env: "all-but-numba"
python-version: "3.12"
os: ubuntu-latest
- env: "all-but-dask"
# Not 3.12 because of pint
python-version: "3.11"
Expand Down Expand Up @@ -105,7 +108,7 @@ jobs:
echo "PYTHON_VERSION=${{ matrix.python-version }}" >> $GITHUB_ENV
- name: Setup micromamba
uses: mamba-org/setup-micromamba@v1
uses: mamba-org/setup-micromamba@v2
with:
environment-file: ${{ env.CONDA_ENV_FILE }}
environment-name: xarray-tests
Expand Down Expand Up @@ -159,7 +162,7 @@ jobs:
path: pytest.xml

- name: Upload code coverage to Codecov
uses: codecov/codecov-action@v4.5.0
uses: codecov/codecov-action@v4.6.0
with:
file: ./coverage.xml
flags: unittests
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/hypothesis.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ jobs:
echo "TODAY=$(date +'%Y-%m-%d')" >> $GITHUB_ENV
- name: Setup micromamba
uses: mamba-org/setup-micromamba@v1
uses: mamba-org/setup-micromamba@v2
with:
environment-file: ci/requirements/environment.yml
environment-name: xarray-tests
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/nightly-wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: "3.11"
python-version: "3.12"

- name: Install dependencies
run: |
Expand All @@ -38,7 +38,7 @@ jobs:
fi
- name: Upload wheel
uses: scientific-python/upload-nightly-action@b67d7fcc0396e1128a474d1ab2b48aa94680f9fc # 0.5.0
uses: scientific-python/upload-nightly-action@82396a2ed4269ba06c6b2988bb4fd568ef3c3d6b # 0.6.1
with:
anaconda_nightly_upload_token: ${{ secrets.ANACONDA_NIGHTLY }}
artifacts_path: dist
4 changes: 2 additions & 2 deletions .github/workflows/pypi-release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ jobs:
path: dist
- name: Publish package to TestPyPI
if: github.event_name == 'push'
uses: pypa/[email protected].1
uses: pypa/[email protected].3
with:
repository_url: https://test.pypi.org/legacy/
verbose: true
Expand All @@ -111,6 +111,6 @@ jobs:
name: releases
path: dist
- name: Publish package to PyPI
uses: pypa/[email protected].1
uses: pypa/[email protected].3
with:
verbose: true
6 changes: 3 additions & 3 deletions .github/workflows/upstream-dev-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ jobs:
with:
fetch-depth: 0 # Fetch all history for all branches and tags.
- name: Set up conda environment
uses: mamba-org/setup-micromamba@v1
uses: mamba-org/setup-micromamba@v2
with:
environment-file: ci/requirements/environment.yml
environment-name: xarray-tests
Expand Down Expand Up @@ -120,7 +120,7 @@ jobs:
with:
fetch-depth: 0 # Fetch all history for all branches and tags.
- name: Set up conda environment
uses: mamba-org/setup-micromamba@v1
uses: mamba-org/setup-micromamba@v2
with:
environment-file: ci/requirements/environment.yml
environment-name: xarray-tests
Expand All @@ -146,7 +146,7 @@ jobs:
run: |
python -m mypy --install-types --non-interactive --cobertura-xml-report mypy_report
- name: Upload mypy coverage to Codecov
uses: codecov/codecov-action@v4.5.0
uses: codecov/codecov-action@v4.6.0
with:
file: mypy_report/cobertura.xml
flags: mypy
Expand Down
11 changes: 3 additions & 8 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ ci:
autoupdate_commit_msg: 'Update pre-commit hooks'
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
rev: v5.0.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
Expand All @@ -13,22 +13,17 @@ repos:
- id: mixed-line-ending
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: 'v0.6.3'
rev: 'v0.6.9'
hooks:
- id: ruff-format
- id: ruff
args: ["--fix", "--show-fixes"]
# https://github.com/python/black#version-control-integration
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 24.8.0
hooks:
- id: black-jupyter
- repo: https://github.com/keewis/blackdoc
rev: v0.3.9
hooks:
- id: blackdoc
exclude: "generate_aggregations.py"
additional_dependencies: ["black==24.8.0"]
- id: blackdoc-autoupdate-black
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.11.2
hooks:
Expand Down
3 changes: 1 addition & 2 deletions CORE_TEAM_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,8 +271,7 @@ resources such as:
[NumPy documentation guide](https://numpy.org/devdocs/dev/howto-docs.html#documentation-style)
for docstring conventions.
- [`pre-commit`](https://pre-commit.com) hooks for autoformatting.
- [`black`](https://github.com/psf/black) autoformatting.
- [`flake8`](https://github.com/PyCQA/flake8) linting.
- [`ruff`](https://github.com/astral-sh/ruff) autoformatting and linting.
- [python-xarray](https://stackoverflow.com/questions/tagged/python-xarray) on Stack Overflow.
- [@xarray_dev](https://twitter.com/xarray_dev) on Twitter.
- [xarray-dev](https://discord.gg/bsSGdwBn) discord community (normally only used for remote synchronous chat during sprints).
Expand Down
63 changes: 63 additions & 0 deletions DATATREE_MIGRATION_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Migration guide for users of `xarray-contrib/datatree`

_15th October 2024_

This guide is for previous users of the prototype `datatree.DataTree` class in the `xarray-contrib/datatree repository`. That repository has now been archived, and will not be maintained. This guide is intended to help smooth your transition to using the new, updated `xarray.DataTree` class.

> [!IMPORTANT]
> There are breaking changes! You should not expect that code written with `xarray-contrib/datatree` will work without any modifications. At the absolute minimum you will need to change the top-level import statement, but there are other changes too.
We have made various changes compared to the prototype version. These can be split into three categories: data model changes, which affect the hierarchal structure itself; integration with xarray's IO backends; and minor API changes, which mostly consist of renaming methods to be more self-consistent.

### Data model changes

The most important changes made are to the data model of `DataTree`. Whilst previously data in different nodes was unrelated and therefore unconstrained, now trees have "internal alignment" - meaning that dimensions and indexes in child nodes must exactly align with those in their parents.

These alignment checks happen at tree construction time, meaning there are some netCDF4 files and zarr stores that could previously be opened as `datatree.DataTree` objects using `datatree.open_datatree`, but now cannot be opened as `xr.DataTree` objects using `xr.open_datatree`. For these cases we added a new opener function `xr.open_groups`, which returns a `dict[str, Dataset]`. This is intended as a fallback for tricky cases, where the idea is that you can still open the entire contents of the file using `open_groups`, edit the `Dataset` objects, then construct a valid tree from the edited dictionary using `DataTree.from_dict`.

The alignment checks allowed us to add "Coordinate Inheritance", a much-requested feature where indexed coordinate variables are now "inherited" down to child nodes. This allows you to define common coordinates in a parent group that are then automatically available on every child node. The distinction between a locally-defined coordinate variables and an inherited coordinate that was defined on a parent node is reflected in the `DataTree.__repr__`. Generally if you prefer not to have these variables be inherited you can get more similar behaviour to the old `datatree` package by removing indexes from coordinates, as this prevents inheritance.

Tree structure checks between multiple trees (i.e., `DataTree.isomorophic`) and pairing of nodes in arithmetic has also changed. Nodes are now matched (with `xarray.group_subtrees`) based on their relative paths, without regard to the order in which child nodes are defined.

For further documentation see the page in the user guide on Hierarchical Data.

### Integrated backends

Previously `datatree.open_datatree` used a different codepath from `xarray.open_dataset`, and was hard-coded to only support opening netCDF files and Zarr stores.
Now xarray's backend entrypoint system has been generalized to include `open_datatree` and the new `open_groups`.
This means we can now extend other xarray backends to support `open_datatree`! If you are the maintainer of an xarray backend we encourage you to add support for `open_datatree` and `open_groups`!

Additionally:
- A `group` kwarg has been added to `open_datatree` for choosing which group in the file should become the root group of the created tree.
- Various performance improvements have been made, which should help when opening netCDF files and Zarr stores with large numbers of groups.
- We anticipate further performance improvements being possible for datatree IO.

### API changes

A number of other API changes have been made, which should only require minor modifications to your code:
- The top-level import has changed, from `from datatree import DataTree, open_datatree` to `from xarray import DataTree, open_datatree`. Alternatively you can now just use the `import xarray as xr` namespace convention for everything datatree-related.
- The `DataTree.ds` property has been changed to `DataTree.dataset`, though `DataTree.ds` remains as an alias for `DataTree.dataset`.
- Similarly the `ds` kwarg in the `DataTree.__init__` constructor has been replaced by `dataset`, i.e. use `DataTree(dataset=)` instead of `DataTree(ds=...)`.
- The method `DataTree.to_dataset()` still exists but now has different options for controlling which variables are present on the resulting `Dataset`, e.g. `inherit=True/False`.
- `DataTree.copy()` also has a new `inherit` keyword argument for controlling whether or not coordinates defined on parents are copied (only relevant when copying a non-root node).
- The `DataTree.parent` property is now read-only. To assign a ancestral relationships directly you must instead use the `.children` property on the parent node, which remains settable.
- Similarly the `parent` kwarg has been removed from the `DataTree.__init__` constuctor.
- DataTree objects passed to the `children` kwarg in `DataTree.__init__` are now shallow-copied.
- `DataTree.as_array` has been replaced by `DataTree.to_dataarray`.
- A number of methods which were not well tested have been (temporarily) disabled. In general we have tried to only keep things that are known to work, with the plan to increase API surface incrementally after release.

## Thank you!

Thank you for trying out `xarray-contrib/datatree`!

We welcome contributions of any kind, including good ideas that never quite made it into the original datatree repository. Please also let us know if we have forgotten to mention a change that should have been listed in this guide.

Sincerely, the datatree team:

Tom Nicholas,
Owen Littlejohns,
Matt Savoie,
Eni Awowale,
Alfonso Ladino,
Justus Magin,
Stephan Hoyer
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Xarray is a fiscally sponsored project of
[NumFOCUS](https://numfocus.org), a nonprofit dedicated to supporting
the open source scientific computing community. If you like Xarray and
want to support our mission, please consider making a
[donation](https://numfocus.salsalabs.org/donate-to-xarray/) to support
[donation](https://numfocus.org/donate-to-xarray) to support
our efforts.

## History
Expand Down
8 changes: 4 additions & 4 deletions asv_bench/benchmarks/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,15 @@ def decorator(func):
def requires_dask():
try:
import dask # noqa: F401
except ImportError:
raise NotImplementedError()
except ImportError as err:
raise NotImplementedError() from err


def requires_sparse():
try:
import sparse # noqa: F401
except ImportError:
raise NotImplementedError()
except ImportError as err:
raise NotImplementedError() from err


def randn(shape, frac_nan=None, chunks=None, seed=0):
Expand Down
6 changes: 3 additions & 3 deletions asv_bench/benchmarks/accessors.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ def setup(self, calendar):
self.da = xr.DataArray(data, dims="time", coords={"time": time})

def time_dayofyear(self, calendar):
self.da.time.dt.dayofyear
_ = self.da.time.dt.dayofyear

def time_year(self, calendar):
self.da.time.dt.year
_ = self.da.time.dt.year

def time_floor(self, calendar):
self.da.time.dt.floor("D")
_ = self.da.time.dt.floor("D")
4 changes: 2 additions & 2 deletions asv_bench/benchmarks/dataset_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -606,8 +606,8 @@ def setup(self):

try:
import distributed
except ImportError:
raise NotImplementedError()
except ImportError as err:
raise NotImplementedError() from err

self.client = distributed.Client()
self.write = create_delayed_write()
Expand Down
Loading

0 comments on commit 0542944

Please sign in to comment.