Micro optimization -- use tuples throughout backend indexing #9009

hmaarrfk · 2024-05-06T19:39:02Z

Since Python 3.X (i can't remember which) tuple concatenation like this
is fast since python inspects the tuple usage and knows that it is the
last reference and reuses the underlying c object.

The change to LazilyIndexedArray is that shape is repeatidely accessed
throughout the codebase (ndim, and shape are heavily used) and thus we
benefit from pre-computing this at creation time.

keep MyPy happy.
Closes #xxxx
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

* add .oindex and .vindex to BackendArray * Add support for .oindex and .vindex in H5NetCDFArrayWrapper * Add support for .oindex and .vindex in NetCDF4ArrayWrapper, PydapArrayWrapper, NioArrayWrapper, and ZarrArrayWrapper * add deprecation warning * Fix deprecation warning message formatting * add tests * Update xarray/core/indexing.py Co-authored-by: Deepak Cherian <[email protected]> * Update ZarrArrayWrapper class in xarray/backends/zarr.py Co-authored-by: Deepak Cherian <[email protected]> --------- Co-authored-by: Deepak Cherian <[email protected]>

* origin/main: clean up the upstream-dev setup script (pydata#8986) Skip flaky `test_open_mfdataset_manyfiles` test (pydata#8989) Remove `.drop` warning allow (pydata#8988) Add notes on when to add ignores to warnings (pydata#8987) Docstring and documentation improvement for the Dataset class (pydata#8973)

…dexing adapters and explicitly indexed arrays (pydata#8870) * pass key tuple to indexing adapters and explicitly indexed arrays * update indexing in StackedBytesArray * Update indexing in StackedBytesArray * Add _IndexerKey type to _typing.py * Update indexing in StackedBytesArray * use tuple indexing in test_backend_array_deprecation_warning * Add support for CompatIndexedTuple in explicit indexing adapter This commit updates the `explicit_indexing_adapter` function to accept both `ExplicitIndexer` and the new `CompatIndexedTuple`. The `CompatIndexedTuple` is designed to facilitate the transition towards using raw tuples by carrying additional metadata about the indexing type (basic, vectorized, or outer). * remove unused code * type hint fixes * fix docstrings * fix tests * fix docstrings * Apply suggestions from code review Co-authored-by: Deepak Cherian <[email protected]> * update docstrings and pass tuples directly * Some test cleanup * update docstring * use `BasicIndexer` instead of `CompatIndexedTuple` * support explicit indexing with tuples * fix mypy errors * remove unused IndexerMaker * Update LazilyIndexedArray._updated_key to support explicit indexing with tuples --------- Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Deepak Cherian <[email protected]>

* origin/main: call `np.cross` with 3D vectors only (pydata#8993) Mark `test_use_cftime_false_standard_calendar_in_range` as an expected failure (pydata#8996) Migration of datatree/ops.py -> datatree_ops.py (pydata#8976) avoid a couple of warnings in `polyfit` (pydata#8939)

Since Python 3.X (i can't remember which) tuple concatenation like this is fast since python inspects the tuple usage and knows that it is the last reference and reuses the underlying c object. The change to LazilyIndexedArray is that `shape` is repeatidely accessed throughout the codebase (ndim, and shape are heavily used) and thus we benefit from pre-computing this at creation time.

* backend-indexing: Trigger CI only if code files are modified. (pydata#9006) Enable explicit use of key tuples (instead of *Indexer objects) in indexing adapters and explicitly indexed arrays (pydata#8870) add `.oindex` and `.vindex` to `BackendArray` (pydata#8885) temporary enable CI triggers on feature branch Avoid auto creation of indexes in concat (pydata#8872) Fix benchmark CI (pydata#9013) Avoid extra read from disk when creating Pandas Index. (pydata#8893) Add a benchmark to monitor performance for large dataset indexing (pydata#9012) Zarr: Optimize `region="auto"` detection (pydata#8997) Trigger CI only if code files are modified. (pydata#9006) Fix for ruff 0.4.3 (pydata#9007) Port negative frequency fix for `pandas.date_range` to `cftime_range` (pydata#8999) Bump codecov/codecov-action from 4.3.0 to 4.3.1 in the actions group (pydata#9004) Speed up localize (pydata#8536) Simplify fast path (pydata#9001) Add argument check_dims to assert_allclose to allow transposed inputs (pydata#5733) (pydata#8991) Fix syntax error in test related to cupy (pydata#9000)

andersy005 · 2024-05-12T01:25:22Z

thank you for these additions, @hmaarrfk!

andersy005 and others added 6 commits April 9, 2024 17:43

temporary enable CI triggers on feature branch

b81b451

Merge branch 'main' into backend-indexing

e96e70e

hmaarrfk force-pushed the tuples_backend_indexing branch from bdd3262 to 163ce2a Compare May 6, 2024 19:39

hmaarrfk force-pushed the tuples_backend_indexing branch from 1b7c3a1 to ebfb715 Compare May 6, 2024 22:43

andersy005 force-pushed the backend-indexing branch from bfb334c to 18c5c70 Compare May 10, 2024 04:47

andersy005 added 2 commits May 9, 2024 22:39

formatting only

171a736

hmaarrfk marked this pull request as ready for review May 10, 2024 11:04

andersy005 added 2 commits May 11, 2024 18:21

Merge branch 'backend-indexing' into tuples_backend_indexing

303881a

Merge branch 'backend-indexing' into tuples_backend_indexing

99cd884

andersy005 approved these changes May 12, 2024

View reviewed changes

andersy005 merged commit f2c4659 into pydata:backend-indexing May 12, 2024
21 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Micro optimization -- use tuples throughout backend indexing #9009

Micro optimization -- use tuples throughout backend indexing #9009

hmaarrfk commented May 6, 2024 •

edited

Loading

andersy005 commented May 12, 2024

Micro optimization -- use tuples throughout backend indexing #9009

Micro optimization -- use tuples throughout backend indexing #9009

Conversation

hmaarrfk commented May 6, 2024 • edited Loading

andersy005 commented May 12, 2024

hmaarrfk commented May 6, 2024 •

edited

Loading