Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical ExtensionArray cannot be compared to scalar #9090

Closed
5 tasks done
jc-5s opened this issue Jun 11, 2024 · 3 comments · Fixed by #9032
Closed
5 tasks done

Categorical ExtensionArray cannot be compared to scalar #9090

jc-5s opened this issue Jun 11, 2024 · 3 comments · Fixed by #9032
Labels
bug needs triage Issue that has not been reviewed by xarray team member

Comments

@jc-5s
Copy link

jc-5s commented Jun 11, 2024

What happened?

The minimal example code below produces the stack trace provided in the "log output" section (ValueError)

It looks like internally the scalar 'c' is wrapped in a length 1 Categorical and then compared to the source array, and that comparison fails as the source and wrapped arrays have different lengths

What did you expect to happen?

The expected output is:

array([False, False,  True])

Minimal Complete Verifiable Example

import xarray
import pandas
xarray.DataArray(pandas.Categorical(['a', 'a', 'c'])) == 'c'

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 3
      1 import xarray
      2 import pandas
----> 3 xarray.DataArray(pandas.Categorical(['a', 'a', 'c'])) == 'c'

File ~/.pyenv/versions/test-venv/lib/python3.12/site-packages/xarray/core/_typed_ops.py:294, in DataArrayOpsMixin.__eq__(self, other)
    293 def __eq__(self, other: DaCompatible) -> Self:  # type:ignore[override]
--> 294     return self._binary_op(other, nputils.array_eq)

File ~/.pyenv/versions/test-venv/lib/python3.12/site-packages/xarray/core/dataarray.py:4725, in DataArray._binary_op(self, other, f, reflexive)
   4721 other_variable_or_arraylike: DaCompatible = getattr(other, "variable", other)
   4722 other_coords = getattr(other, "coords", None)
   4724 variable = (
-> 4725     f(self.variable, other_variable_or_arraylike)
   4726     if not reflexive
   4727     else f(other_variable_or_arraylike, self.variable)
   4728 )
   4729 coords, indexes = self.coords._merge_raw(other_coords, reflexive)
   4730 name = self._result_name(other)

File ~/.pyenv/versions/test-venv/lib/python3.12/site-packages/xarray/core/nputils.py:113, in array_eq(self, other)
    111 with warnings.catch_warnings():
    112     warnings.filterwarnings("ignore", r"elementwise comparison failed")
--> 113     return _ensure_bool_is_ndarray(self == other, self, other)

File ~/.pyenv/versions/test-venv/lib/python3.12/site-packages/xarray/core/_typed_ops.py:608, in VariableOpsMixin.__eq__(self, other)
    607 def __eq__(self, other: VarCompatible) -> Self | T_DataArray:
--> 608     return self._binary_op(other, nputils.array_eq)

File ~/.pyenv/versions/test-venv/lib/python3.12/site-packages/xarray/core/variable.py:2317, in Variable._binary_op(self, other, f, reflexive)
   2314 attrs = self._attrs if keep_attrs else None
   2315 with np.errstate(all="ignore"):
   2316     new_data = (
-> 2317         f(self_data, other_data) if not reflexive else f(other_data, self_data)
   2318     )
   2319 result = Variable(dims, new_data, attrs=attrs)
   2320 return result

File ~/.pyenv/versions/test-venv/lib/python3.12/site-packages/xarray/core/nputils.py:113, in array_eq(self, other)
    111 with warnings.catch_warnings():
    112     warnings.filterwarnings("ignore", r"elementwise comparison failed")
--> 113     return _ensure_bool_is_ndarray(self == other, self, other)

File ~/.pyenv/versions/test-venv/lib/python3.12/site-packages/xarray/core/extension_array.py:129, in PandasExtensionArray.__eq__(self, other)
    127     other = type(self)(type(self.array)([other]))
    128 if isinstance(other, PandasExtensionArray):
--> 129     return self.array == other.array
    130 return self.array == other

File ~/.pyenv/versions/test-venv/lib/python3.12/site-packages/pandas/core/ops/common.py:81, in _unpack_zerodim_and_defer.<locals>.new_method(self, other)
     77             return NotImplemented
     79 other = item_from_zerodim(other)
---> 81 return method(self, other)

File ~/.pyenv/versions/test-venv/lib/python3.12/site-packages/pandas/core/arrays/categorical.py:131, in _cat_compare_op.<locals>.func(self, other)
    128 hashable = is_hashable(other)
    129 if is_list_like(other) and len(other) != len(self) and not hashable:
    130     # in hashable case we may have a tuple that is itself a category
--> 131     raise ValueError("Lengths must match.")
    133 if not self.ordered:
    134     if opname in ["__lt__", "__gt__", "__le__", "__ge__"]:

ValueError: Lengths must match.

Anything else we need to know?

Thanks very much for the recent addition of ExtensionArray support :)

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.3 (main, Apr 10 2024, 22:06:03) [GCC 10.5.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-105-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: ('en_GB', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2024.5.0
pandas: 2.0.3
numpy: 1.26.4
scipy: 1.13.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: 1.3.8
dask: 2024.5.0
distributed: None
matplotlib: 3.8.4
cartopy: None
seaborn: 0.13.2
numbagg: 0.8.1
fsspec: 2024.3.1
cupy: None
pint: None
sparse: None
flox: 0.9.7
numpy_groupies: 0.11.1
setuptools: 69.5.1
pip: 24.0
conda: None
pytest: 8.2.0
mypy: 1.10.0
IPython: 8.24.0
sphinx: 7.3.7

@jc-5s jc-5s added bug needs triage Issue that has not been reviewed by xarray team member labels Jun 11, 2024
Copy link

welcome bot commented Jun 11, 2024

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@keewis keewis linked a pull request Jun 11, 2024 that will close this issue
3 tasks
@keewis
Copy link
Collaborator

keewis commented Jun 11, 2024

It looks like this was fixed by #9032. We'll get a release including that out very soon.

Closing, but feel free to reopen if it still doesn't work.

@keewis keewis closed this as completed Jun 11, 2024
@hottwaj
Copy link

hottwaj commented Jun 11, 2024

Thanks good to hear and apologies for not coming across that PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs triage Issue that has not been reviewed by xarray team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants