Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2x~5x speed up for isel() in most cases #3533

Merged
merged 9 commits into from
Dec 5, 2019

Conversation

crusaderky
Copy link
Contributor

@crusaderky crusaderky commented Nov 14, 2019

Yet another major improvement for #2799.

Achieve a 2x to 5x boost in isel performance when slicing small arrays by int, slice, list of int, scalar ndarray, or 1-dimensional ndarray.

import xarray

da = xarray.DataArray([[1, 2], [3, 4]], dims=['x', 'y'])
v = da.variable
a = da.variable.values
ds = da.to_dataset(name="d")

ds_with_idx = xarray.Dataset({
    'x': [10, 20],
    'y': [100, 200],
    'd': (('x', 'y'), [[1, 2], [3, 4]])
})
da_with_idx = ds_with_idx.d

# before -> after
%timeit a[0] # 121 ns
%timeit v[0] # 7 µs
%timeit v.isel(x=0) # 10 µs
%timeit da[0]  # 65 µs -> 15 µs
%timeit da.isel(x=0)  # 63 µs -> 13 µs
%timeit ds.isel(x=0)  # 48 µs -> 24 µs
%timeit da_with_idx[0]  # 209 µs -> 82 µs
%timeit da_with_idx.isel(x=0, drop=False)  # 135 µs -> 34 µs
%timeit da_with_idx.isel(x=0, drop=True)  # 101 µs -> 34 µs
%timeit ds_with_idx.isel(x=0, drop=False)  # 90 µs -> 49 µs
%timeit ds_with_idx.isel(x=0, drop=True)  # 65 µs -> 49 µs

Marked as WIP because this commands running the asv suite to verify there are no regressions for large arrays.
(on a separate note, we really need to add the small size cases to asv - as discussed in #3382).

This profoundly alters one of the most important methods in xarray and I must confess it makes me nervous, particularly as I am unsure if the test coverage of DataArray.isel() is as through as that for Dataset.isel().

@crusaderky
Copy link
Contributor Author

crusaderky commented Nov 14, 2019

@shoyer I struggle to understand the meaning of Dataset.indexes. How does it differ from

{
    k: v.to_index()
    for k, v in self._variables.items()
    if isinstance(v, IndexVariable)
}

?

@crusaderky crusaderky changed the title WIP: Speed up isel in most cases WIP: 2x~5x speed up for isel() in most cases Nov 14, 2019
xarray/core/dataarray.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Great idea!

@shoyer
Copy link
Member

shoyer commented Nov 14, 2019

Dataset.indexes is currently equivalent to just calling to_index() on each appropriate variable. But eventually we want to separate the notion of an index from variables with names matching dimensions, and this is a prerequisite for that.

@crusaderky
Copy link
Contributor Author

I would very much appreciate if everybody could play around with this PR in their projects and confirm that nothing breaks!

@crusaderky crusaderky changed the title WIP: 2x~5x speed up for isel() in most cases [needs live testing] 2x~5x speed up for isel() in most cases Nov 15, 2019
@crusaderky
Copy link
Contributor Author

Merging soon if nobody speaks up!

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested this but the logic here looks sound.

xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
@crusaderky
Copy link
Contributor Author

Implemented @shoyer 's suggestions and aligned to master.
Merging into master tomorrow!

@crusaderky crusaderky changed the title [needs live testing] 2x~5x speed up for isel() in most cases 2x~5x speed up for isel() in most cases Dec 5, 2019
@crusaderky
Copy link
Contributor Author

In it goes *ducks for cover*

@crusaderky crusaderky merged commit 87a25b6 into pydata:master Dec 5, 2019
@crusaderky crusaderky deleted the fast_isel branch December 5, 2019 16:39
@dcherian
Copy link
Contributor

dcherian commented Dec 5, 2019

=) thanks @crusaderky

dcherian added a commit to dcherian/xarray that referenced this pull request Dec 7, 2019
* upstream/master:
  Fix map_blocks HLG layering (pydata#3598)
  Silence sphinx warnings: Round 2 (pydata#3592)
  2x~5x speed up for isel() in most cases (pydata#3533)
  remove xarray again (pydata#3591)
  fix plotting with transposed nondim coords. (pydata#3441)
  make coarsen reductions consistent with reductions on other classes (pydata#3500)
  Resolve the version issues on RTD (pydata#3589)
  Add bottleneck & rasterio git tip to upstream-dev CI (pydata#3585)
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 30, 2019
…equiv

* 'master' of github.com:pydata/xarray: (28 commits)
  Add nanmedian for dask arrays (pydata#3604)
  added pyinterp to related projects (pydata#3655)
  Allow incomplete hypercubes in combine_by_coords (pydata#3649)
  concat keeps attrs from first variable. (pydata#3637)
  Extend DatetimeAccessor properties and support `.dt` accessor for Timedelta (pydata#3612)
  update readthedocs.yml (pydata#3639)
  silence sphinx warnings round 3 (pydata#3602)
  Fix/quantile wrong errmsg (pydata#3635)
  Provide shape info in shape mismatch error. (pydata#3619)
  Minor doc fixes (pydata#3615)
  Respect user-specified coordinates attribute. (pydata#3487)
  Add Facetgrid.row_labels & Facetgrid.col_labels (pydata#3597)
  Fix pint integration tests (pydata#3600)
  Minor fix to combine_by_coords to allow for the combination of CFTimeIndexes separated by large time intervals (pydata#3543)
  Fix map_blocks HLG layering (pydata#3598)
  Silence sphinx warnings: Round 2 (pydata#3592)
  2x~5x speed up for isel() in most cases (pydata#3533)
  remove xarray again (pydata#3591)
  fix plotting with transposed nondim coords. (pydata#3441)
  make coarsen reductions consistent with reductions on other classes (pydata#3500)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants