Replies: 6 comments 16 replies
-
I think the fundamental requirement is a variable of some sort. It seems like a scalar value can be used. |
Beta Was this translation helpful? Give feedback.
-
Why not having a variable wrapping a lazy array (e.g., a dask array or any other duck array) for the range index? This may never get materialized into a discrete array fully in memory, but this could also be done if needed. This is almost possible today (needs work in #8124). It requires that the variable exists before the index is created, but this could be done for example by a custom method (in an accessor) that creates both the coordinate and the index. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the help and feedback. I realize this is a bit redudant with examples that @benbovy and others have already shared. However, if we want this capability to be widely adopted, we need the API to be easy to use. So hopefuly me sharing my struggles is productive! 🙃 Here's something I got working import xarray as xr
import numpy as np
from xarray.indexes import Index
from xarray.core.indexes import IndexSelResult
@dataclass(frozen=True)
class BoundsIndexer:
left: float
right: float
npoints: int
def get_nearest_point(self, value):
if value < self.left or value > self.right:
raise ValueError("Value outside of bounds")
return int((value - self.left) / (self.right - self.left) * ( self.npoints - 1))
@dataclass(frozen=True)
class BoundsIndex(Index):
indexes: dict[str, BoundsIndexer]
@classmethod
def from_variables(cls, variables, *, options):
# assume there is only one variable
assert len(variables) == 1
# and that variable name is "bounds"
bounds = variables["bounds"]
indexes = {
k: BoundsIndexer(v['left'], v['right'], v["npoints"])
for k, v in bounds.attrs.items()
}
return cls(indexes)
def sel(self, labels, method=None, tolerance=None):
if method != "nearest":
raise ValueError("Only nearest method is supported")
# feels super redundant to have to do this
bounds = labels["bounds"]
results = {}
for dim, q in bounds.items():
results[dim] = self.indexes[dim].get_nearest_point(q)
return IndexSelResult(results) Now I can create a dataset like this ds = xr.Dataset(
data_vars={"foo": (("y", "x"), np.arange(npoints * npoints).reshape(npoints, npoints))},
coords={"bounds":
((), (), {
"x": {"left": 0.0, "right": 1.0, "npoints": npoints},
"y": {"left": 0.0, "right": 2.0, "npoints": npoints}
})
}
).set_xindex("bounds", BoundsIndex) which looks like this
And I can query it like this ds.sel(bounds={'x': 0.5, 'y': 0.2}, method="nearest") What I like about this
What feels weird
In both cases, it seems like things could be improved by somehow making indexes aware of / associated with dimensions, rather than only variables. I'm also coming around to the idea that there could be lazily materialized The other thing we will need is some sort of plugin / entry-point mechanism for encoding / decoding so that we can make these indexes created automatically upon loading conforming datasets. |
Beta Was this translation helpful? Give feedback.
-
Another point of reference for storing coordinate information without a corresponding variable: OME-Zarr Coordinate Transformations. It would be amazing if we could open OME-Zarr in Xarray and expose useful indexes! |
Beta Was this translation helpful? Give feedback.
-
Here's a very concrete, practical task that could be used to advance this issue: Try to modify RioXarray's If we can make this work well today with no changes to Xarray, that tells us that the current API is up to the job. If not, it will suggest where we need to make changes. I believe that the crux will come down to the issue around whether we can associate custom indexes with dimensions, rather than coordinates. (See #8955 (reply in thread)). |
Beta Was this translation helpful? Give feedback.
-
Would like to chime in with a use case from ionospheric physics, in case that's useful. I've worked with some radio datasets that are sampled at, e.g., 20 MHz, meaning that what one is working with is a billions-length 1D dataset that's |
Beta Was this translation helpful? Give feedback.
-
Motivated by discussions in the GeoZarr spec and Pangeo Discourse, I've convinced myself that it's important for Xarray to be able to represent "analytic" coordinates, i.e. coordinates defined via a mathematical formula and never explicitly materialized into a discrete array.
This is impossible today, even with the new flexible indexes API, and I think this is something we need to fix.
To boil it down to the simplest possible example, imagine we have a 1D array representing evenly spaced values in the interval (0, 1).
I can easily create an accessor that allows me to query this data without ever explicitly materializing the full array of coordinates into memory. Here's an example
But what i want is to override Xarray's native
.sel
method with this index!That is not possible with the custom indexes API, because
xr.Index
requires a class method calledfrom_variables
which generated an index from existing variables. In this case, there ARE no variables.cc @benbovy @jhamman @dcherian
Beta Was this translation helpful? Give feedback.
All reactions