Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interp and reindex should work for 1d -> nd indexing #3252

Closed
shoyer opened this issue Aug 23, 2019 · 12 comments · Fixed by #3758
Closed

interp and reindex should work for 1d -> nd indexing #3252

shoyer opened this issue Aug 23, 2019 · 12 comments · Fixed by #3758

Comments

@shoyer
Copy link
Member

shoyer commented Aug 23, 2019

This works with isel and sel (see below). There's no particular reason why it can't work with reindex and interp, too, though we would need to implement our own vectorized version of linear interpolation (should not be too bad, it's mostly a matter of indexing twice and calculating weights from the difference in coordinate values).

Apparently this is quite important for vertical regridding in weather/climate science (cc @rabernat, @nbren12 )

In [35]: import xarray as xr

In [36]: import numpy as np

In [37]: data = xr.DataArray(np.arange(12).reshape((3, 4)), [('x', np.arange(3)), ('y', np.arange(4))])

In [38]: ind = xr.DataArray([[0, 2], [1, 0], [1, 2]], dims=['x', 'z'], coords={'x': [0, 1, 2]})

In [39]: data
Out[39]:
<xarray.DataArray (x: 3, y: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2 3

In [40]: ind
Out[40]:
<xarray.DataArray (x: 3, z: 2)>
array([[0, 2],
       [1, 0],
       [1, 2]])
Coordinates:
  * x        (x) int64 0 1 2
Dimensions without coordinates: z

In [41]: data.isel(y=ind)
Out[41]:
<xarray.DataArray (x: 3, z: 2)>
array([[ 0,  2],
       [ 5,  4],
       [ 9, 10]])
Coordinates:
  * x        (x) int64 0 1 2
    y        (x, z) int64 0 2 1 0 1 2
Dimensions without coordinates: z

In [42]: data.sel(y=ind)
Out[42]:
<xarray.DataArray (x: 3, z: 2)>
array([[ 0,  2],
       [ 5,  4],
       [ 9, 10]])
Coordinates:
  * x        (x) int64 0 1 2
    y        (x, z) int64 0 2 1 0 1 2
Dimensions without coordinates: z

In [43]: data.interp(y=ind)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-43-e6eb7e39fd31> in <module>
----> 1 data.interp(y=ind)

~/dev/xarray/xarray/core/dataarray.py in interp(self, coords, method, assume_sorted, kwargs, **coords_kwargs)
   1303             kwargs=kwargs,
   1304             assume_sorted=assume_sorted,
-> 1305             **coords_kwargs
   1306         )
   1307         return self._from_temp_dataset(ds)

~/dev/xarray/xarray/core/dataset.py in interp(self, coords, method, assume_sorted, kwargs, **coords_kwargs)
   2455                     }
   2456                     variables[name] = missing.interp(
-> 2457                         var, var_indexers, method, **kwargs
   2458                     )
   2459                 elif all(d not in indexers for d in var.dims):

~/dev/xarray/xarray/core/missing.py in interp(var, indexes_coords, method, **kwargs)
    533         else:
    534             out_dims.add(d)
--> 535     return result.transpose(*tuple(out_dims))
    536
    537

~/dev/xarray/xarray/core/variable.py in transpose(self, *dims)
   1219             return self.copy(deep=False)
   1220
-> 1221         data = as_indexable(self._data).transpose(axes)
   1222         return type(self)(dims, data, self._attrs, self._encoding, fastpath=True)
   1223

~/dev/xarray/xarray/core/indexing.py in transpose(self, order)
   1218
   1219     def transpose(self, order):
-> 1220         return self.array.transpose(order)
   1221
   1222     def __getitem__(self, key):

ValueError: axes don't match array

In [44]: data.reindex(y=ind)
/Users/shoyer/dev/xarray/xarray/core/dataarray.py:1240: FutureWarning: Indexer has dimensions ('x', 'z') that are different from that to be indexed along y. This will behave differently in the future.
  fill_value=fill_value,
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-44-1277ead996ae> in <module>
----> 1 data.reindex(y=ind)

~/dev/xarray/xarray/core/dataarray.py in reindex(self, indexers, method, tolerance, copy, fill_value, **indexers_kwargs)
   1238             tolerance=tolerance,
   1239             copy=copy,
-> 1240             fill_value=fill_value,
   1241         )
   1242         return self._from_temp_dataset(ds)

~/dev/xarray/xarray/core/dataset.py in reindex(self, indexers, method, tolerance, copy, fill_value, **indexers_kwargs)
   2360             tolerance,
   2361             copy=copy,
-> 2362             fill_value=fill_value,
   2363         )
   2364         coord_names = set(self._coord_names)

~/dev/xarray/xarray/core/alignment.py in reindex_variables(variables, sizes, indexes, indexers, method, tolerance, copy, fill_value)
    398             )
    399
--> 400         target = new_indexes[dim] = utils.safe_cast_to_index(indexers[dim])
    401
    402         if dim in indexes:

~/dev/xarray/xarray/core/utils.py in safe_cast_to_index(array)
    104         index = array
    105     elif hasattr(array, "to_index"):
--> 106         index = array.to_index()
    107     else:
    108         kwargs = {}

~/dev/xarray/xarray/core/dataarray.py in to_index(self)
    545         arrays.
    546         """
--> 547         return self.variable.to_index()
    548
    549     @property

~/dev/xarray/xarray/core/variable.py in to_index(self)
    445     def to_index(self):
    446         """Convert this variable to a pandas.Index"""
--> 447         return self.to_index_variable().to_index()
    448
    449     def to_dict(self, data=True):

~/dev/xarray/xarray/core/variable.py in to_index_variable(self)
    438         """Return this variable as an xarray.IndexVariable"""
    439         return IndexVariable(
--> 440             self.dims, self._data, self._attrs, encoding=self._encoding, fastpath=True
    441         )
    442

~/dev/xarray/xarray/core/variable.py in __init__(self, dims, data, attrs, encoding, fastpath)
   1941         super().__init__(dims, data, attrs, encoding, fastpath)
   1942         if self.ndim != 1:
-> 1943             raise ValueError("%s objects must be 1-dimensional" % type(self).__name__)
   1944
   1945         # Unlike in Variable, always eagerly load values into memory

ValueError: IndexVariable objects must be 1-dimensional
@nbren12
Copy link
Contributor

nbren12 commented Aug 23, 2019

I have some numba code which does this for linear interpolation. Does scipy support this pattern?

@shoyer
Copy link
Member Author

shoyer commented Aug 23, 2019

We could implement linear interpolation just as w * array.sel(..., method='ffill') + (1 - w) * array.sel(..., method='bfill'), where w is calculated from the indexes. This is probably about as efficient as scipy. Numba could be slightly faster (by avoiding a few memory allocations) but it will be hard to make it work as generally, especially on duck arrays.

@nbren12
Copy link
Contributor

nbren12 commented Aug 23, 2019

In my experience, computing w efficiently is the tricky part. The function is slightly different, but metpy uses a lot of tricks to make this work efficiently. A manual for-loop is much cleaner for this kind of stencil calculation IMO. What kind of duck arrays were you thinking of?

@shoyer
Copy link
Member Author

shoyer commented Aug 23, 2019 via email

@nbren12
Copy link
Contributor

nbren12 commented Aug 24, 2019

After reading the method='ffill' docs, I agree with you.

@nbren12
Copy link
Contributor

nbren12 commented Aug 24, 2019

So when would interp use this manual interpolation method? Would it try it only if scipy fails or check the dimensions?

@shoyer
Copy link
Member Author

shoyer commented Aug 24, 2019

We could probably use this our own version for all linear and nearest neighbor interpolation. Then we won’t need scipy installed for that.

@nbren12
Copy link
Contributor

nbren12 commented Aug 24, 2019

Ok. I started playing around with this, but I am getting errors when indexing arrays with ND variables. data.sel(x=nd_x) works, but any subsequent operations complain that IndexVariable objects must be 1-dimensional:

>>> import xarray as xr
>>> import numpy as np
>>>     npdata = np.tile(np.arange(10), (5, 1))
...     data = xr.DataArray(npdata, dims=['y', 'x'],
...                         coords={'x': np.r_[:10], 'y': np.r_[:5]})
...     idx = xr.DataArray(npdata, dims=['z', 'x'])
...
...     ans = data.sel(x=idx, method='bfill')
...     assert set(ans.dims) == {'z', 'y', 'x'}
...     print(ans)
Traceback (most recent call last):
  File "<ipython-input-4-1d0ac0cac680>", line 8, in <module>
    print(ans)
  File "/Users/noah/workspace/software/xarray/xarray/core/common.py", line 129, in __repr__
    return formatting.array_repr(self)
  File "/Users/noah/workspace/software/xarray/xarray/core/formatting.py", line 463, in array_repr
    summary.append(repr(arr.coords))
  File "/Users/noah/workspace/software/xarray/xarray/core/coordinates.py", line 78, in __repr__
    return formatting.coords_repr(self)
  File "/Users/noah/workspace/software/xarray/xarray/core/formatting.py", line 381, in coords_repr
    coords, title="Coordinates", summarizer=summarize_coord, col_width=col_width
  File "/Users/noah/workspace/software/xarray/xarray/core/formatting.py", line 361, in _mapping_repr
    summary += [summarizer(k, v, col_width) for k, v in mapping.items()]
  File "/Users/noah/workspace/software/xarray/xarray/core/formatting.py", line 361, in <listcomp>
    summary += [summarizer(k, v, col_width) for k, v in mapping.items()]
  File "/Users/noah/workspace/software/xarray/xarray/core/formatting.py", line 307, in summarize_coord
    coord = var.variable.to_index_variable()
  File "/Users/noah/workspace/software/xarray/xarray/core/variable.py", line 440, in to_index_variable
    self.dims, self._data, self._attrs, encoding=self._encoding, fastpath=True
  File "/Users/noah/workspace/software/xarray/xarray/core/variable.py", line 1943, in __init__
    raise ValueError("%s objects must be 1-dimensional" % type(self).__name__)
ValueError: IndexVariable objects must be 1-dimensional

nbren12 added a commit to nbren12/xarray that referenced this issue Aug 24, 2019
@nbren12
Copy link
Contributor

nbren12 commented Aug 24, 2019

Ok. I realized this problem occurs only because x was a dimension of the new index idx and the original dataset. Perhaps sel should warn the user or raise an error when this happens.

@huard
Copy link
Contributor

huard commented Feb 6, 2020

Just got bit by this as well. Computing monthly quantile correction factors, so I have an array with dimensions (month, quantile, lon, lat). I then want to apply these correction factors to a time series (time, lon, lat), so I compute the month and quantile of my time series, and want to interp into the quantile correction factors. This doesn't work because both the factors and the time series have (lat, lon) dimensions.

@shoyer
Copy link
Member Author

shoyer commented Feb 6, 2020 via email

@huard
Copy link
Contributor

huard commented Feb 6, 2020

@shoyer I'm having trouble wrapping my head around this. The example above is essentially a 1D interpolation over y, a coordinate selection over x, and a broadcast over z. I don't think it should interpolate over x, right ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants