Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when assigning using .from_pandas_multiindex #8455

Closed
5 tasks done
max-sixty opened this issue Nov 15, 2023 · 3 comments
Closed
5 tasks done

Errors when assigning using .from_pandas_multiindex #8455

max-sixty opened this issue Nov 15, 2023 · 3 comments
Labels
bug plan to close May be closeable, needs more eyeballs topic-indexing

Comments

@max-sixty
Copy link
Collaborator

What happened?

Very possibly this is user-error, forgive me if so.

I'm trying to transition some code from the previous assignment of MultiIndexes, to the new world. Here's an MCVE:

What did you expect to happen?

No response

Minimal Complete Verifiable Example

da =  xr.tutorial.open_dataset("air_temperature")['air']

# old code, works, but with a warning

da.expand_dims('foo').assign_coords(foo=(pd.MultiIndex.from_tuples([(1,2)])))

<ipython-input-25-f09b7f52bb42>:1: FutureWarning: the `pandas.MultiIndex` object(s) passed as 'foo' coordinate(s) or data variable(s) will no longer be implicitly promoted and wrapped into multiple indexed coordinates in the future (i.e., one coordinate for each multi-index level + one dimension coordinate). If you want to keep this behavior, you need to first wrap it explicitly using `mindex_coords = xarray.Coordinates.from_pandas_multiindex(mindex_obj, 'dim')` and pass it as coordinates, e.g., `xarray.Dataset(coords=mindex_coords)`, `dataset.assign_coords(mindex_coords)` or `dataarray.assign_coords(mindex_coords)`.
  da.expand_dims('foo').assign_coords(foo=(pd.MultiIndex.from_tuples([(1,2)])))
Out[25]:
<xarray.DataArray 'air' (foo: 1, time: 2920, lat: 25, lon: 53)>
array([[[[241.2    , 242.5    , 243.5    , ..., 232.79999, 235.5    ,
          238.59999],
 ...
         [297.69   , 298.09   , 298.09   , ..., 296.49   , 296.19   ,
          295.69   ]]]], dtype=float32)
Coordinates:
  * lat          (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * lon          (lon) float32 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
  * time         (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
  * foo          (foo) object MultiIndex
  * foo_level_0  (foo) int64 1
  * foo_level_1  (foo) int64 2

# new code — seems to get confused between the number of values in the index — 1 — and the number of levels — 3 including the parent:

da.expand_dims('foo').assign_coords(foo=xr.Coordinates.from_pandas_multiindex(pd.MultiIndex.from_tuples([(1,2)]), dim='foo'))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[26], line 1
----> 1 da.expand_dims('foo').assign_coords(foo=xr.Coordinates.from_pandas_multiindex(pd.MultiIndex.from_tuples([(1,2)]), dim='foo'))

File ~/workspace/xarray/xarray/core/common.py:621, in DataWithCoords.assign_coords(self, coords, **coords_kwargs)
    618 else:
    619     results = self._calc_assign_results(coords_combined)
--> 621 data.coords.update(results)
    622 return data

File ~/workspace/xarray/xarray/core/coordinates.py:566, in Coordinates.update(self, other)
    560 # special case for PandasMultiIndex: updating only its dimension coordinate
    561 # is still allowed but depreciated.
    562 # It is the only case where we need to actually drop coordinates here (multi-index levels)
    563 # TODO: remove when removing PandasMultiIndex's dimension coordinate.
    564 self._drop_coords(self._names - coords_to_align._names)
--> 566 self._update_coords(coords, indexes)

File ~/workspace/xarray/xarray/core/coordinates.py:834, in DataArrayCoordinates._update_coords(self, coords, indexes)
    832 coords_plus_data = coords.copy()
    833 coords_plus_data[_THIS_ARRAY] = self._data.variable
--> 834 dims = calculate_dimensions(coords_plus_data)
    835 if not set(dims) <= set(self.dims):
    836     raise ValueError(
    837         "cannot add coordinates with new dimensions to a DataArray"
    838     )

File ~/workspace/xarray/xarray/core/variable.py:3014, in calculate_dimensions(variables)
   3012             last_used[dim] = k
   3013         elif dims[dim] != size:
-> 3014             raise ValueError(
   3015                 f"conflicting sizes for dimension {dim!r}: "
   3016                 f"length {size} on {k!r} and length {dims[dim]} on {last_used!r}"
   3017             )
   3018 return dims

ValueError: conflicting sizes for dimension 'foo': length 1 on <this-array> and length 3 on {'lat': 'lat', 'lon': 'lon', 'time': 'time', 'foo': 'foo'}

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.18 (main, Nov 2 2023, 16:51:22)
[Clang 14.0.3 (clang-1403.0.22.14.1)]
python-bits: 64
OS: Darwin
OS-release: 22.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: en_US.UTF-8
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2023.10.2.dev10+gccc8f998
pandas: 2.1.1
numpy: 1.25.2
scipy: 1.11.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.4.0
distributed: 2023.7.1
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: 0.2.3.dev30+gd26e29e
fsspec: 2021.11.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: 0.9.19
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: 7.4.0
mypy: 1.6.0
IPython: 8.15.0
sphinx: 4.3.2

@max-sixty max-sixty added bug needs triage Issue that has not been reviewed by xarray team member topic-indexing and removed needs triage Issue that has not been reviewed by xarray team member labels Nov 15, 2023
@benbovy
Copy link
Member

benbovy commented Nov 30, 2023

I think you rather want to pass the xarray.Coordinates object as the only positional argument to .assign_coords (you also probably don't need expand_dims):

da.assign_coords(xr.Coordinates.from_pandas_multiindex(pd.MultiIndex.from_tuples([(1,2)]), dim='foo'))

xr.Coordinates.from_pandas_multiindex() converts a pandas multi-index into a dict-like container of xarray indexed coordinates so that we don't need anymore to treat a pandas.MultiIndex like a single dimension coordinate with hidden level coordinates.

@max-sixty
Copy link
Collaborator Author

I think you rather want to pass the xarray.Coordinates object as the only positional argument to .assign_coords

Ah great, that does work.

Should we update the signature of .assign_coords, then? Currently it dict-like or None, optional


(you also probably don't need expand_dims):

Hmmm, no luck without it...

da.assign_coords(xr.Coordinates.from_pandas_multiindex(pd.MultiIndex.from_tuples([(1,2)]), dim='foo'))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[8], line 1
----> 1 da.assign_coords(xr.Coordinates.from_pandas_multiindex(pd.MultiIndex.from_tuples([(1,2)]), dim='foo'))

File ~/workspace/xarray/xarray/core/common.py:621, in DataWithCoords.assign_coords(self, coords, **coords_kwargs)
    618 else:
    619     results = self._calc_assign_results(coords_combined)
--> 621 data.coords.update(results)
    622 return data

File ~/workspace/xarray/xarray/core/coordinates.py:566, in Coordinates.update(self, other)
    560 # special case for PandasMultiIndex: updating only its dimension coordinate
    561 # is still allowed but depreciated.
    562 # It is the only case where we need to actually drop coordinates here (multi-index levels)
    563 # TODO: remove when removing PandasMultiIndex's dimension coordinate.
    564 self._drop_coords(self._names - coords_to_align._names)
--> 566 self._update_coords(coords, indexes)

File ~/workspace/xarray/xarray/core/coordinates.py:836, in DataArrayCoordinates._update_coords(self, coords, indexes)
    834 dims = calculate_dimensions(coords_plus_data)
    835 if not set(dims) <= set(self.dims):
--> 836     raise ValueError(
    837         "cannot add coordinates with new dimensions to a DataArray"
    838     )
    839 self._data._coords = coords
    841 # TODO(shoyer): once ._indexes is always populated by a dict, modify
    842 # it to update inplace instead.

ValueError: cannot add coordinates with new dimensions to a DataArray

max-sixty added a commit to max-sixty/xarray that referenced this issue Nov 30, 2023
@benbovy
Copy link
Member

benbovy commented Dec 1, 2023

Hmmm, no luck without it...

Ah yes, right you need expand_dims :)

@max-sixty max-sixty added the plan to close May be closeable, needs more eyeballs label Dec 1, 2023
@max-sixty max-sixty closed this as not planned Won't fix, can't repro, duplicate, stale Dec 4, 2023
max-sixty added a commit that referenced this issue Dec 4, 2023
* Fix type of `.assign_coords`

As discussed in #8455

* Update xarray/core/common.py

Co-authored-by: Benoit Bovy <[email protected]>

* Generally improve docstring

* .

---------

Co-authored-by: Benoit Bovy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug plan to close May be closeable, needs more eyeballs topic-indexing
Projects
None yet
Development

No branches or pull requests

2 participants