Using oindex and vindex instead of ExplicitIndexer objects. #9273

dcherian · 2024-07-24T19:32:32Z

Continues the refactoring of backends and lazy indexing classes

delete CI changes
Closes ScipyArrayWrapper' object has no attribute 'oindex' #8909 ,
closes Revert .oindex and .vindex additions in _ElementwiseFunctionArray, NativeEndiannessArray, and BoolTypeArray classes #8921
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

* add .oindex and .vindex to BackendArray * Add support for .oindex and .vindex in H5NetCDFArrayWrapper * Add support for .oindex and .vindex in NetCDF4ArrayWrapper, PydapArrayWrapper, NioArrayWrapper, and ZarrArrayWrapper * add deprecation warning * Fix deprecation warning message formatting * add tests * Update xarray/core/indexing.py Co-authored-by: Deepak Cherian <[email protected]> * Update ZarrArrayWrapper class in xarray/backends/zarr.py Co-authored-by: Deepak Cherian <[email protected]> --------- Co-authored-by: Deepak Cherian <[email protected]>

…dexing adapters and explicitly indexed arrays (#8870) * pass key tuple to indexing adapters and explicitly indexed arrays * update indexing in StackedBytesArray * Update indexing in StackedBytesArray * Add _IndexerKey type to _typing.py * Update indexing in StackedBytesArray * use tuple indexing in test_backend_array_deprecation_warning * Add support for CompatIndexedTuple in explicit indexing adapter This commit updates the `explicit_indexing_adapter` function to accept both `ExplicitIndexer` and the new `CompatIndexedTuple`. The `CompatIndexedTuple` is designed to facilitate the transition towards using raw tuples by carrying additional metadata about the indexing type (basic, vectorized, or outer). * remove unused code * type hint fixes * fix docstrings * fix tests * fix docstrings * Apply suggestions from code review Co-authored-by: Deepak Cherian <[email protected]> * update docstrings and pass tuples directly * Some test cleanup * update docstring * use `BasicIndexer` instead of `CompatIndexedTuple` * support explicit indexing with tuples * fix mypy errors * remove unused IndexerMaker * Update LazilyIndexedArray._updated_key to support explicit indexing with tuples --------- Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Deepak Cherian <[email protected]>

* Trigger CI only if code files are modified. Fixes #8705 * Apply suggestions from code review Co-authored-by: Maximilian Roos <[email protected]> --------- Co-authored-by: Maximilian Roos <[email protected]>

* main: [pre-commit.ci] pre-commit autoupdate (#9005)

* main: Add whatsnew entry for #8974 (#9022) Zarr: Optimize appending (#8998)

Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Anderson Banihirwe <[email protected]> Co-authored-by: Anderson Banihirwe <[email protected]>

* main: New whatsnew section Release summary for v2024.05.0 (#9021)

* main: Bump codecov/codecov-action from 4.3.1 to 4.4.0 in the actions group (#9036) Use ME in test_plot instead of M (#9035) Fix numbagg or bottlekneck skip (#9034) [skip-ci] min_deps_check: show age of required pkg on error (#9025) attempt to get colour output in CI (#9031)

* main: array api-related upstream-dev failures (#8854) (fix): equality check against singleton `PandasExtensionArray` (#9032)

* main: (48 commits) Add test for #9155 (#9161) Remove mypy exclusions for a couple more libraries (#9160) Include numbagg in type checks (#9159) Improve zarr chunks docs (#9140) groupby: remove some internal use of IndexVariable (#9123) Improve `to_zarr` docs (#9139) Split out distributed writes in zarr docs (#9132) Update zendoo badge link (#9133) Support duplicate dimensions in `.chunk` (#9099) Bump the actions group with 2 updates (#9130) adjust repr tests to account for different platforms (#9127) (#9128) Grouper refactor (#9122) Update docstring in api.py for open_mfdataset(), clarifying "chunks" argument (#9121) Add test for rechunking to a size string (#9117) Move Sphinx directives out of `See also` (#8466) new whats-new section (#9115) release v2024.06.0 (#9113) release notes for 2024.06.0 (#9092) [skip-ci] Try fixing hypothesis CI trigger (#9112) Undo custom padding-top. (#9107) ...

* main: (54 commits) Adding `open_datatree` backend-specific keyword arguments (#9199) [pre-commit.ci] pre-commit autoupdate (#9202) Restore ability to specify _FillValue as Python native integers (#9258) add backend intro and how-to diagram (#9175) Fix copybutton for multi line examples in double digit ipython cells (#9264) Update signature for _arrayfunction.__array__ (#9237) Add encode_cf_datetime benchmark (#9262) groupby, resample: Deprecate some positional args (#9236) Delete ``base`` and ``loffset`` parameters to resample (#9233) Update dropna docstring (#9257) Grouper, Resampler as public api (#8840) Fix mypy on main (#9252) fix fallback isdtype method (#9250) Enable pandas type checking (#9213) Per-variable specification of boolean parameters in open_dataset (#9218) test push Added a space to the documentation (#9247) Fix typing for test_plot.py (#9234) Allow mypy to run in vscode (#9239) Revert "Test main push" ...

.github/workflows/ci-additional.yaml

dcherian · 2024-07-31T20:24:03Z

Could use some input here.

The reason #8909 snuck in is because it only manifests with engine="scipy" when netcdf is not installed.

I haven't tracked down why but it shows that we should be testing the backends individually, they should not depend on another backend being installed (at least not for the bundled engines). How can we do that nicely?

keewis · 2024-07-31T20:45:42Z

How can we do that nicely?

If I understand the backends correctly, we could monkeypatch list_engines to only return a single BackendEntrypoint. Something like this, maybe?

def test_backend_scipy(..., monkeypatch):
    monkeypatch.setattr(xr.backends.plugins, "list_engines", partial(custom_list_engines, ScipyEntrypoint()))

(I didn't verify this actually works, and whether there's a better way depends on the test, I guess)

monkeypatching in general is something to be avoided usually, but in this case I don't think we can, unless we want to refactor the plugin machinery.

headtr1ck

Can we get some return types on these methods? Thanks!

dcherian · 2024-10-29T13:50:13Z

@andersy005 are you able to add some typing here?

headtr1ck · 2024-10-29T13:55:29Z

@andersy005 are you able to add some typing here?

Shouldn't be a blocker. It's already better than before! But nice to have anyway :)

andersy005 · 2024-10-29T17:39:34Z

@andersy005 are you able to add some typing here?

absolutely! my time is quite limited today, but i intend to prioritize working on this tomorrow

for more information, see https://pre-commit.ci

…attribute

…dArray

dcherian · 2024-11-19T04:29:24Z

Could use some help with the typing here

cc @headtr1ck @andersy005 @Illviljan

dcherian · 2024-11-19T04:29:59Z

xarray/tests/test_indexing.py

@@ -824,14 +822,14 @@ def test_create_mask_outer_indexer() -> None:
 def test_create_mask_vectorized_indexer() -> None:
    indexer = indexing.VectorizedIndexer((np.array([0, -1, 2]), np.array([0, 1, -1])))
    expected = np.array([False, True, True])
-    actual = indexing.create_mask(indexer, (5,))
+    actual = indexing.create_mask(indexer, (5, 5))


I think this is right. THe indexer is a 2-element tuple, so the array's shape should be 2D

dcherian · 2024-11-19T04:30:12Z

xarray/tests/test_indexing.py

    np.testing.assert_array_equal(expected, actual)

    indexer = indexing.VectorizedIndexer(
        (np.array([0, -1, 2]), slice(None), np.array([0, 1, -1]))
    )
    expected = np.array([[False, True, True]] * 2).T
-    actual = indexing.create_mask(indexer, (5, 2))
+    actual = indexing.create_mask(indexer, (5, 2, 5))


similarly here indexer is a 3-element tuple, so the shape should be 3D

dcherian · 2024-11-19T19:03:39Z

xarray/tests/test_backends.py

@@ -695,141 +837,6 @@ def test_roundtrip_boolean_dtype(self) -> None:
                assert_identical(original, actual2)
                assert actual2["x"].dtype == "bool"

-    def test_orthogonal_indexing(self) -> None:


simply moved to a new mixin class.

dcherian · 2024-11-19T19:04:03Z

xarray/tests/test_backends.py

+    pass
+
+
+# @indexing_tests(indexing_support=IndexingSupport.OUTER)


will need a different _getitem implementation for these.

dcherian · 2024-12-04T17:04:09Z

xarray/backends/common.py

+        return self[key]  # type: ignore [index]
+
+
+class NewBackendArray(NdimSizeLenMixin, indexing.ExplicitlyIndexed):


I don't think its possible to do this without creating a new BackendArray class for backends.

dcherian · 2024-12-04T17:04:40Z

xarray/tests/test_backends.py

@@ -351,7 +353,147 @@ def test_dtype_coercion_error(self) -> None:
                    ds.to_netcdf(path, format=format)


-class DatasetIOBase:
+class BackendIndexingTestsMixin:


simply moved out from DatasetIOBase. I can make a separate PR for this,

dcherian · 2024-12-04T17:04:57Z

xarray/tests/test_backends.py

@@ -6489,3 +6496,83 @@ def test_zarr_safe_chunk_region(tmp_path):
    chunk = ds.isel(region)
    chunk = chunk.chunk()
    chunk.chunk().to_zarr(store, region=region)
+
+
+class LegacyBackendArrayWrapper(LegacyBackendArray):


this is what cfgrib's backend array looks like

dcherian · 2024-12-04T17:06:07Z

xarray/backends/common.py

+    @property
+    def oindex(self) -> indexing.IndexCallable:
+        return indexing.IndexCallable(self._oindex_get)
+
+    @property
+    def vindex(self) -> indexing.IndexCallable:
+        return indexing.IndexCallable(self._vindex_get)


as reminder the goal is to use .oindex and .vindex to be explicit about which indexing to use instead of IndexingAdapter subclasses.

I think this gets us nearly all the way to removing the special case for ExplicitlyIndexed in as_compatible_data

dcherian · 2024-12-04T17:07:21Z

xarray/backends/h5netcdf_.py

-        return indexing.explicit_indexing_adapter(
-            key, self.shape, indexing.IndexingSupport.OUTER_1VECTOR, self._getitem
+    def _oindex_get(self, key: _OuterIndexerKey) -> Any:
+        return indexing.outer_indexing_adapter(


now we have three different adapters . This kind of dispatching was happening in explicit_indexing_adapter using the IndexingAdapter subclasses. Now it is more explicit

dcherian · 2024-12-04T17:08:18Z

xarray/core/indexing.py

@@ -1014,35 +1138,60 @@ def explicit_indexing_adapter(
    -------
    Indexing result, in the form of a duck numpy-array.
    """
+
+    # If the array is not an ExplicitlyIndexedNDArrayMixin,


Let's uncomment this in a future PR once we are sure nothing is broken

dcherian · 2024-12-04T17:20:02Z

xarray/namedarray/_typing.py

+_BasicIndexerKey = tuple[int | np.integer | slice, ...]
+_OuterIndexerKey = tuple[
+    int | np.integer | slice | np.ndarray[Any, np.dtype[np.integer]], ...
+]
+_VectorizedIndexerKey = tuple[slice | np.ndarray[Any, np.dtype[np.integer]], ...]



not sure these types are fully accurate

* main: (79 commits) fix mean for datetime-like using the respective time resolution unit (#9977) Add `time_unit` argument to `CFTimeIndex.to_datetimeindex` (#9965) remove gate and add a test (#9958) Remove repetitive that (replace it with the) (#9994) add shxarray to the xarray ecosystem list (#9995) Add `shards` to `valid_encodings` to enable sharded Zarr writing (#9948) Use flox for grouped first, last (#9986) Bump the actions group with 2 updates (#9989) Fix some typing (#9988) Remove unnecessary a article (#9980) Fix test_doc_example on big-endian systems (#9949) fix weighted polyfit for arrays with more than 2 dimensions (#9974) Use zarr-fixture to prevent thread leakage errors (#9967) remove dask-expr from CI runs, fix related tests (#9971) Update time coding tests to assert exact equality (#9961) cast type to PDDatetimeUnitOptions (#9963) Suggest the correct name when no key matches in the dataset (#9943) fix upstream dev issues (#9953) Relax nanosecond datetime restriction in CF time decoding (#9618) Remove outdated quantile test. (#9945) ...

andersy005 and others added 12 commits May 9, 2024 21:45

temporary enable CI triggers on feature branch

3029943

Trigger CI only if code files are modified. (#9006)

18c5c70

* Trigger CI only if code files are modified. Fixes #8705 * Apply suggestions from code review Co-authored-by: Maximilian Roos <[email protected]> --------- Co-authored-by: Maximilian Roos <[email protected]>

Merge branch 'main' into backend-indexing

b46c320

* main: [pre-commit.ci] pre-commit autoupdate (#9005)

Merge branch 'main' into backend-indexing

7d55345

* main: Add whatsnew entry for #8974 (#9022) Zarr: Optimize appending (#8998)

fix bad merge

795daf2

Micro optimization -- use tuples throughout backend indexing (#9009)

f2c4659

Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Anderson Banihirwe <[email protected]> Co-authored-by: Anderson Banihirwe <[email protected]>

Merge branch 'main' into backend-indexing

d763c02

* main: New whatsnew section Release summary for v2024.05.0 (#9021)

Merge branch 'main' into backend-indexing

8d64f60

* main: array api-related upstream-dev failures (#8854) (fix): equality check against singleton `PandasExtensionArray` (#9032)

dcherian force-pushed the backend-indexing branch from 96dbd47 to 46a902f Compare July 24, 2024 19:44

TomNicholas added the topic-indexing label Jul 27, 2024

TomNicholas reviewed Jul 27, 2024

View reviewed changes

.github/workflows/ci-additional.yaml Outdated Show resolved Hide resolved

headtr1ck reviewed Aug 15, 2024

View reviewed changes

andersy005 and others added 7 commits October 30, 2024 12:38

Merge branch 'main' into backend-indexing

07f663e

[pre-commit.ci] auto fixes from pre-commit.com hooks

8b591e0

for more information, see https://pre-commit.ci

enhance type annotations and improve clarity

8cc0d29

Fix indexing logic to correctly handle array with __array_function__ …

5884605

…attribute

Merge branch 'main' into backend-indexing

146daff

update indexing methods to use OuterIndexer type

50791e0

remove unnecessary copy argument from __array__ method in MemoryCache…

014e7cf

…dArray

dcherian added 6 commits November 18, 2024 20:50

more typing

222c5c2

Fix test

810b822

strict=True

a414965

more typing

0b99aea

fix

2ceaeac

some more fixes

749da0b

dcherian commented Nov 19, 2024

View reviewed changes

dcherian requested review from TomNicholas and andersy005 and removed request for TomNicholas November 19, 2024 04:57

dcherian added 4 commits November 19, 2024 07:33

little more type narrowing

f58262a

Refactor backend indexing tests

dcd3ac9

Add legacy backend indexing tests

2105aa0

Avoid raising deprecation warning now.

fb24e9c

dcherian commented Nov 19, 2024

View reviewed changes

dcherian commented Dec 4, 2024

View reviewed changes

dcherian force-pushed the backend-indexing branch from dffc05c to 1ffe5e9 Compare January 30, 2025 05:11

dcherian changed the title ~~Pass around tuples instead of ExplicitIndexer objects.~~ Using oindex and vindex instead of ExplicitIndexer objects. Jan 30, 2025

dcherian requested a review from shoyer January 30, 2025 05:18

fix

6064046

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using oindex and vindex instead of ExplicitIndexer objects. #9273

Using oindex and vindex instead of ExplicitIndexer objects. #9273

dcherian commented Jul 24, 2024 •

edited

Loading

dcherian commented Jul 31, 2024

keewis commented Jul 31, 2024 •

edited

Loading

headtr1ck left a comment

dcherian commented Oct 29, 2024

headtr1ck commented Oct 29, 2024

andersy005 commented Oct 29, 2024

dcherian commented Nov 19, 2024

dcherian Nov 19, 2024

dcherian Nov 19, 2024

dcherian Nov 19, 2024

dcherian Nov 19, 2024

dcherian Dec 4, 2024

dcherian Dec 4, 2024

dcherian Dec 4, 2024

dcherian Dec 4, 2024

dcherian Dec 4, 2024 •

edited

Loading

dcherian Dec 4, 2024

dcherian Dec 4, 2024 •

edited

Loading

		pass


		# @indexing_tests(indexing_support=IndexingSupport.OUTER)

		return self[key] # type: ignore [index]


		class NewBackendArray(NdimSizeLenMixin, indexing.ExplicitlyIndexed):

Using oindex and vindex instead of ExplicitIndexer objects. #9273

Are you sure you want to change the base?

Using oindex and vindex instead of ExplicitIndexer objects. #9273

Conversation

dcherian commented Jul 24, 2024 • edited Loading

dcherian commented Jul 31, 2024

keewis commented Jul 31, 2024 • edited Loading

headtr1ck left a comment

Choose a reason for hiding this comment

dcherian commented Oct 29, 2024

headtr1ck commented Oct 29, 2024

andersy005 commented Oct 29, 2024

dcherian commented Nov 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcherian Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcherian Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

dcherian commented Jul 24, 2024 •

edited

Loading

keewis commented Jul 31, 2024 •

edited

Loading

dcherian Dec 4, 2024 •

edited

Loading

dcherian Dec 4, 2024 •

edited

Loading