Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Micro optimization -- use tuples throughout backend indexing #9009

Merged
merged 11 commits into from
May 12, 2024

Conversation

hmaarrfk
Copy link
Contributor

@hmaarrfk hmaarrfk commented May 6, 2024

xref: #9002 (comment)

Since Python 3.X (i can't remember which) tuple concatenation like this
is fast since python inspects the tuple usage and knows that it is the
last reference and reuses the underlying c object.

The change to LazilyIndexedArray is that shape is repeatidely accessed
throughout the codebase (ndim, and shape are heavily used) and thus we
benefit from pre-computing this at creation time.

  • keep MyPy happy.
  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

andersy005 and others added 6 commits April 9, 2024 17:43
* add .oindex and .vindex to BackendArray

* Add support for .oindex and .vindex in H5NetCDFArrayWrapper

* Add support for .oindex and .vindex in NetCDF4ArrayWrapper, PydapArrayWrapper, NioArrayWrapper, and ZarrArrayWrapper

* add deprecation warning

* Fix deprecation warning message formatting

* add tests

* Update xarray/core/indexing.py

Co-authored-by: Deepak Cherian <[email protected]>

* Update ZarrArrayWrapper class in xarray/backends/zarr.py

Co-authored-by: Deepak Cherian <[email protected]>

---------

Co-authored-by: Deepak Cherian <[email protected]>
* origin/main:
  clean up the upstream-dev setup script (pydata#8986)
  Skip flaky `test_open_mfdataset_manyfiles` test (pydata#8989)
  Remove `.drop` warning allow (pydata#8988)
  Add notes on when to add ignores to warnings (pydata#8987)
  Docstring and documentation improvement for the Dataset class (pydata#8973)
…dexing adapters and explicitly indexed arrays (pydata#8870)

* pass key tuple to indexing adapters and explicitly indexed arrays

* update indexing in StackedBytesArray

* Update indexing in StackedBytesArray

* Add _IndexerKey type to _typing.py

* Update indexing in StackedBytesArray

* use tuple indexing in test_backend_array_deprecation_warning

* Add support for CompatIndexedTuple in explicit indexing adapter

This commit updates the `explicit_indexing_adapter` function to accept both
`ExplicitIndexer` and the new `CompatIndexedTuple`. The `CompatIndexedTuple` is
designed to facilitate the transition towards using raw tuples by carrying
additional metadata about the indexing type (basic, vectorized, or outer).

* remove unused code

* type hint fixes

* fix docstrings

* fix tests

* fix docstrings

* Apply suggestions from code review

Co-authored-by: Deepak Cherian <[email protected]>

* update docstrings and pass tuples directly

* Some test cleanup

* update docstring

* use `BasicIndexer` instead of `CompatIndexedTuple`

* support explicit indexing with tuples

* fix mypy errors

* remove unused IndexerMaker

* Update LazilyIndexedArray._updated_key to support explicit indexing with tuples

---------

Co-authored-by: Deepak Cherian <[email protected]>
Co-authored-by: Deepak Cherian <[email protected]>
* origin/main:
  call `np.cross` with 3D vectors only (pydata#8993)
  Mark `test_use_cftime_false_standard_calendar_in_range` as an expected failure (pydata#8996)
  Migration of datatree/ops.py -> datatree_ops.py (pydata#8976)
  avoid a couple of warnings in `polyfit` (pydata#8939)
@hmaarrfk hmaarrfk force-pushed the tuples_backend_indexing branch from bdd3262 to 163ce2a Compare May 6, 2024 19:39
Since Python 3.X (i can't remember which) tuple concatenation like this
is fast since python inspects the tuple usage and knows that it is the
last reference and reuses the underlying c object.

The change to LazilyIndexedArray is that `shape` is repeatidely accessed
throughout the codebase (ndim, and shape are heavily used) and thus we
benefit from pre-computing this at creation time.
@hmaarrfk hmaarrfk force-pushed the tuples_backend_indexing branch from 1b7c3a1 to ebfb715 Compare May 6, 2024 22:43
andersy005 added 2 commits May 9, 2024 22:39
* backend-indexing:
  Trigger CI only if code files are modified. (pydata#9006)
  Enable explicit use of key tuples (instead of *Indexer objects) in indexing adapters and explicitly indexed arrays (pydata#8870)
  add `.oindex` and `.vindex` to `BackendArray` (pydata#8885)
  temporary enable CI triggers on feature branch
  Avoid auto creation of indexes in concat (pydata#8872)
  Fix benchmark CI (pydata#9013)
  Avoid extra read from disk when creating Pandas Index. (pydata#8893)
  Add a benchmark to monitor performance for large dataset indexing (pydata#9012)
  Zarr: Optimize `region="auto"` detection (pydata#8997)
  Trigger CI only if code files are modified. (pydata#9006)
  Fix for ruff 0.4.3 (pydata#9007)
  Port negative frequency fix for `pandas.date_range` to `cftime_range` (pydata#8999)
  Bump codecov/codecov-action from 4.3.0 to 4.3.1 in the actions group (pydata#9004)
  Speed up localize (pydata#8536)
  Simplify fast path (pydata#9001)
  Add argument check_dims to assert_allclose to allow transposed inputs (pydata#5733) (pydata#8991)
  Fix syntax error in test related to cupy (pydata#9000)
@hmaarrfk hmaarrfk marked this pull request as ready for review May 10, 2024 11:04
@andersy005
Copy link
Member

thank you for these additions, @hmaarrfk!

@andersy005 andersy005 merged commit f2c4659 into pydata:backend-indexing May 12, 2024
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants