Should sel with slice objects care about underlying coordinate order? #1613

leeviannala · 2017-10-06T10:35:29Z

If I have a xarray DataArray, that has x-coordinates in descending order, and I try to crop the DataArray using
da.sel(x=slice(10,50)), it gives me empty DataArray. If I use
da.sel(x=slice(50,10)), it finds everything it is supposed to find.

I think it would be useful to be able to use this feature without knowing the exact order of the DataArray. Is this intentional?

The text was updated successfully, but these errors were encountered:

shoyer · 2017-10-09T20:48:46Z

This was actually intentional, though you are certainly not the first person to complain about this.

Xarray's behavior mirrors pandas. The justification for this logic is that sel/loc would work like normal indexing [] on a Python list, and if you supply slice limits in the opposite order on a Python list, you get an empty result.

Probably the cleanest solution possible would be a separate selection object for xarray that doesn't care about order, e.g., da.sel(x=xarray.Between(10, 50)).

rgommers · 2018-09-10T22:40:15Z

The justification for this logic is that sel/loc would work like normal indexing [] on a Python list, and if you supply slice limits in the opposite order on a Python list, you get an empty result.

Given that sel is label/value-based, I don't see how the analogy with positional list-based indexing applies. This behavior looks like a functional bug. If you have decided to keep it as is, I'd suggest at the very least to add a warning if an empty dataset is returned because values in an attribute are ordered high...low.

EDIT: also then best to close this issue as wontfix

shoyer · 2018-09-11T00:43:14Z

To clarify: xarray behavior doesn't just mirror pandas here -- it actually directly uses pandas to figure out the mapping from labeled slices to integer slices.

So I'm not opposed to changing this (I agree this trips people up often) but we should try to change it upstream if possible.

rgommers · 2018-09-11T14:35:19Z

Ah okay, that makes sense. I'm sure there's a related pandas issue (or many), will try to find that later.

rgommers · 2018-09-16T02:37:40Z

The only related issues I can find are:

They don't look identical though. Don't really have the time to dive into that further now.

rgommers · 2018-09-16T02:39:50Z

In case it helps anyone else, I ended up doing:

    # Note that xarray is fiddly with indexing - if x or y values are ordered
    # high to low, then the slice bounds need to be reversed.  So check that
    x_ordered_low2high = data.x.values[-1] - data.x.values[0] > 0
    y_ordered_low2high = data.y.values[-1] - data.y.values[0] > 0

    if x_ordered_low2high:
        x_index = slice(lower_lon, upper_lon)
    else:
        x_index = slice(upper_lon, lower_lon)

    if y_ordered_low2high:
        y_index = slice(lower_lat, upper_lat)
    else:
        y_index = slice(upper_lat, lower_lat)

    subset = data.sel(x=x_index, y=y_index)

shoyer · 2018-09-17T16:09:54Z

Consider using index.is_monotonic_ascending and index.is_monotonic_descending instead of subtracting the first few values -- those are exactly the checks that pandas uses.

I also did a little more searching and found that this has come up in the xarray issue tracker before: #1465

Thinking about this a little more, I'm remembering the reasoning for pandas working this way:

Integer slicing by position and by label with positive integers should work the same for an index with values given by range(N).
Positional indexing in Python x[a:b] returns all values between a and b in order. Indexing like x[high:low] with high > low returns an empty list.
Label based indexing in pandas has df.loc[df.index[i] : df.index[j]] equivalent to df.iloc[i : j+1] (arguably this is a mistake; e.g., Slicing inconsistent for coordinates and dimensions without coordinates #1492 and other discussion that I can't find right now :) ).
Indexing a monotonic decreasing index therefore should work for decreasing labels.
If slicing a monotonic increasing index only works for increasing but not decreasing labels, then it would be weird if indexing a monotonic decreasing index works for both increasing and decreasing labels.

rgommers · 2018-09-17T16:31:51Z

Consider using index.is_monotonic_ascending and index.is_monotonic_descending instead of subtracting the first few values -- those are exactly the checks that pandas uses.

Thanks, that's nicer, will do. And thanks for digging up the background/rationales.

(1)Integer slicing by position and by label with positive integers should work the same for an index with values given by range(N).

This I don't think I agree with. Slicing by position and by label are semantically very different operations.

(2) is correct, but irrelevant to label-based indexing.

(3) yes, agree that's a mistake

(4) indeed

(5) I'd say that it's in practice less important, because users normally won't do slice(high, low), but for consistency I agree that there should be symmetry in behaviour. Making slice(high, low) return the same as slice(low, high) on a monotonic increasing index seems reasonable.

Additionally: arguably monotonicity should not be required. When one uses labels, the semantics are clear without monotonicity. This doesn't have a position-based equivalent.

shoyer · 2018-09-17T16:39:39Z

Additionally: arguably monotonicity should not be required. When one uses labels, the semantics are clear without monotonicity.

Requiring monotonicity lets us guarantee that the result is a NumPy view rather than possibly being a copy.

rgommers · 2018-09-17T16:53:59Z

Requiring monotonicity lets us guarantee that the result is a NumPy view rather than possibly being a copy.

Sure, but if a users happens to have non-monotonic data it just requires her to then make that copy first anyway. Still a good thing overall for performance, but there'll be cases where it's just an extra thing to understand for the user without any performance gain.

Anyway, the non-monotonic case is less relevant, because it's harder to run into in practice. The decreasing case however is easy - there is standard geo software (looking at you ArcGIS) that can write geoTiff's with monotonic decreasing indices. That's how I ran into this. Rewriting multi-GB source data that I didn't produce is not an option, so I'm left with the manual monotonicity checks and juggling label-based slices.

fmaussion · 2018-09-17T17:17:18Z

there is standard geo software (looking at you ArcGIS) that can write geoTiff's with monotonic decreasing indices

This is very frequent for climate models also (lat arrays are frequently upside down). An xarray (or upstream) solution would be very useful indeed, but possibly there are other things to consider, like being "too magical".

xarray makes a very good job at handling decreasing coordinates (i.e. when plotting), but soon or late users might need to know how they data is ordered, and indexing might be the right time to warn them about it when they do something wrong: usually an empty array is unlikely to be left unnoticed, so that the current behavior might help to find bugs in user code.

shoyer · 2018-09-18T01:02:30Z

Making slice(high, low) return the same as slice(low, high) on a monotonic increasing index seems reasonable.

Agreed, I don't think many pandas users would object to this.

stale · 2020-08-22T18:34:59Z

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

shoyer · 2022-03-03T17:23:09Z

This is probably worth fixing if possible in a straightforward way. I don't think anyone is well served by matching the behavior of Python list indexing here -- it's a strange edge that case that indexing a list like x[5:0] returns an empty list.

dcherian · 2022-03-03T18:07:16Z

What would be a good solution?

Make sel implicitly ignore order for slice objects.
Add a option to sel for explicitly ignoring slice order.
1. add a new kwarg
2. Add a new xr.Between object

brianmapes · 2022-03-03T18:13:31Z

How about: add one code line at the top of slice that properly orders the exactly two arguments (in other words, reverses the order if necessary)? I haven't studied the code at all, this may be naive. This may the same as 1. above (make sel ignore order). Ideally the default would be silently improved, rather than creating something optional that users would have to make a new habit of, like an additional kwarg or new syntax.

shoyer · 2022-03-03T18:39:59Z

One complication with using sel() with slice objects is that you can do selection over non-monotonic indexes, merely based on matching bounds:

>>> data = xarray.DataArray([1, 2, 3, 4, 5], dims=['x'], coords=[[5, 1, 4, 3, 2]])
>>> data
<xarray.DataArray (x: 5)>
array([1, 2, 3, 4, 5])
Coordinates:
  * x        (x) int64 5 1 4 3 2
>>> data.sel(x=slice(1, 3)))
<xarray.DataArray (x: 3)>
array([2, 3, 4])
Coordinates:
  * x        (x) int64 1 4 3

If we change the semantics of slice in sel() to do filtering rather than be concerned about order (which does seem much less useful), we should probably deprecate the handling of non-monotonic ascending or descending indexes.

Alternatively, we could either do the dedicated indexing object like xarray.Between(lower, upper) or have a dedicated method for selecting between values, e.g., perhaps data.sel_between(x=(1, 3)) or data.sel_bounds(x=(1, 3)).

brianmapes · 2022-03-03T20:19:56Z

@shoyer, selection over non-monotonic indexes, merely based on matching bounds is a case I never considered... probably a rare one, which I for one would happily sacrifice.

New syntax like your first suggestion, .Between instead of .slice if I understand correctly, could be appealing too: The word 'slice' has never seemed terribly intuitive to this user, and 'between' is high in an alphabetical list of tips/suggestions a good IDE might bring up with the tab key or whatever.

brianmapes · 2022-03-08T19:49:33Z

How about an informative warning when returning an empty set, as a minimal change? What's the path to an action item here?

shoyer · 2022-03-08T21:09:05Z

Another challenge with changing the meaning of slice is handling partial slices, e.g., what does slice(500, None) mean? With a monotonic decreasing index, that would select values below 500, but ignoring underlying coordinate order it would presumably mean selecting values above 500.

I think the separate new API (e.g., xarray.Between or .sel_between()) is probably a better idea.

brianmapes · 2022-03-08T22:26:07Z

Agreed, new rather than redefine existing meanings.
Who has the chops and the cred to implement it? Not quite me...

benbovy · 2023-08-30T22:43:06Z

Alternatively, we could either do the dedicated indexing object like xarray.Between(lower, upper) or have a dedicated method for selecting between values, e.g., perhaps data.sel_between(x=(1, 3)) or data.sel_bounds(x=(1, 3)).

xref #7099 about generic API for indexing objects.

jhamman added the topic-indexing label Oct 6, 2017

dcherian mentioned this issue Apr 10, 2019

More Robust Implementation of slice #1465

Closed

dcherian mentioned this issue May 15, 2019

slicing non-monotonously increasing dimension returns empty slice silently #2966

Closed

rabernat mentioned this issue Jan 12, 2020

Decode CF bounds to coords #3689

Closed

stale bot added the stale label Aug 22, 2020

stale bot closed this as completed Sep 22, 2020

dcherian reopened this Sep 22, 2020

stale bot removed the stale label Sep 22, 2020

keewis mentioned this issue Mar 3, 2022

Why must lat=slice() arguments be decreasing? #6327

Closed

dcherian changed the title ~~Improvement suggestion for sel~~ Should sel with slice objects care about underlying coordinate order? Mar 3, 2022

dcherian added the NASA grant label Mar 3, 2022

mathause mentioned this issue Sep 2, 2022

dataset.sel inconsistent results when argument is a list or a slice. #6976

Closed

4 tasks

dcherian mentioned this issue Jul 6, 2023

DataArray.sel methods: "inner" and "outer" #7964

Open

dcherian removed the grant-nasa label Oct 23, 2023

dcherian mentioned this issue Oct 23, 2023

Lat/lon gridded data does not have monotonically increasing latitudes google-research/arco-era5#60

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should sel with slice objects care about underlying coordinate order? #1613

Should sel with slice objects care about underlying coordinate order? #1613

leeviannala commented Oct 6, 2017

shoyer commented Oct 9, 2017

rgommers commented Sep 10, 2018 •

edited

Loading

shoyer commented Sep 11, 2018

rgommers commented Sep 11, 2018

rgommers commented Sep 16, 2018

rgommers commented Sep 16, 2018

shoyer commented Sep 17, 2018

rgommers commented Sep 17, 2018

shoyer commented Sep 17, 2018

rgommers commented Sep 17, 2018

fmaussion commented Sep 17, 2018

shoyer commented Sep 18, 2018

stale bot commented Aug 22, 2020

shoyer commented Mar 3, 2022

dcherian commented Mar 3, 2022

brianmapes commented Mar 3, 2022 •

edited

Loading

shoyer commented Mar 3, 2022

brianmapes commented Mar 3, 2022

brianmapes commented Mar 8, 2022

shoyer commented Mar 8, 2022

brianmapes commented Mar 8, 2022

benbovy commented Aug 30, 2023

Should sel with slice objects care about underlying coordinate order? #1613

Should sel with slice objects care about underlying coordinate order? #1613

Comments

leeviannala commented Oct 6, 2017

shoyer commented Oct 9, 2017

rgommers commented Sep 10, 2018 • edited Loading

shoyer commented Sep 11, 2018

rgommers commented Sep 11, 2018

rgommers commented Sep 16, 2018

rgommers commented Sep 16, 2018

shoyer commented Sep 17, 2018

rgommers commented Sep 17, 2018

shoyer commented Sep 17, 2018

rgommers commented Sep 17, 2018

fmaussion commented Sep 17, 2018

shoyer commented Sep 18, 2018

stale bot commented Aug 22, 2020

shoyer commented Mar 3, 2022

dcherian commented Mar 3, 2022

brianmapes commented Mar 3, 2022 • edited Loading

shoyer commented Mar 3, 2022

brianmapes commented Mar 3, 2022

brianmapes commented Mar 8, 2022

shoyer commented Mar 8, 2022

brianmapes commented Mar 8, 2022

benbovy commented Aug 30, 2023

rgommers commented Sep 10, 2018 •

edited

Loading

brianmapes commented Mar 3, 2022 •

edited

Loading