-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using oindex and vindex instead of ExplicitIndexer objects. #9273
base: main
Are you sure you want to change the base?
Conversation
* add .oindex and .vindex to BackendArray * Add support for .oindex and .vindex in H5NetCDFArrayWrapper * Add support for .oindex and .vindex in NetCDF4ArrayWrapper, PydapArrayWrapper, NioArrayWrapper, and ZarrArrayWrapper * add deprecation warning * Fix deprecation warning message formatting * add tests * Update xarray/core/indexing.py Co-authored-by: Deepak Cherian <[email protected]> * Update ZarrArrayWrapper class in xarray/backends/zarr.py Co-authored-by: Deepak Cherian <[email protected]> --------- Co-authored-by: Deepak Cherian <[email protected]>
…dexing adapters and explicitly indexed arrays (#8870) * pass key tuple to indexing adapters and explicitly indexed arrays * update indexing in StackedBytesArray * Update indexing in StackedBytesArray * Add _IndexerKey type to _typing.py * Update indexing in StackedBytesArray * use tuple indexing in test_backend_array_deprecation_warning * Add support for CompatIndexedTuple in explicit indexing adapter This commit updates the `explicit_indexing_adapter` function to accept both `ExplicitIndexer` and the new `CompatIndexedTuple`. The `CompatIndexedTuple` is designed to facilitate the transition towards using raw tuples by carrying additional metadata about the indexing type (basic, vectorized, or outer). * remove unused code * type hint fixes * fix docstrings * fix tests * fix docstrings * Apply suggestions from code review Co-authored-by: Deepak Cherian <[email protected]> * update docstrings and pass tuples directly * Some test cleanup * update docstring * use `BasicIndexer` instead of `CompatIndexedTuple` * support explicit indexing with tuples * fix mypy errors * remove unused IndexerMaker * Update LazilyIndexedArray._updated_key to support explicit indexing with tuples --------- Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Deepak Cherian <[email protected]>
* Trigger CI only if code files are modified. Fixes #8705 * Apply suggestions from code review Co-authored-by: Maximilian Roos <[email protected]> --------- Co-authored-by: Maximilian Roos <[email protected]>
* main: [pre-commit.ci] pre-commit autoupdate (#9005)
Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Anderson Banihirwe <[email protected]> Co-authored-by: Anderson Banihirwe <[email protected]>
* main: New whatsnew section Release summary for v2024.05.0 (#9021)
* main: (48 commits) Add test for #9155 (#9161) Remove mypy exclusions for a couple more libraries (#9160) Include numbagg in type checks (#9159) Improve zarr chunks docs (#9140) groupby: remove some internal use of IndexVariable (#9123) Improve `to_zarr` docs (#9139) Split out distributed writes in zarr docs (#9132) Update zendoo badge link (#9133) Support duplicate dimensions in `.chunk` (#9099) Bump the actions group with 2 updates (#9130) adjust repr tests to account for different platforms (#9127) (#9128) Grouper refactor (#9122) Update docstring in api.py for open_mfdataset(), clarifying "chunks" argument (#9121) Add test for rechunking to a size string (#9117) Move Sphinx directives out of `See also` (#8466) new whats-new section (#9115) release v2024.06.0 (#9113) release notes for 2024.06.0 (#9092) [skip-ci] Try fixing hypothesis CI trigger (#9112) Undo custom padding-top. (#9107) ...
96dbd47
to
46a902f
Compare
* main: (54 commits) Adding `open_datatree` backend-specific keyword arguments (#9199) [pre-commit.ci] pre-commit autoupdate (#9202) Restore ability to specify _FillValue as Python native integers (#9258) add backend intro and how-to diagram (#9175) Fix copybutton for multi line examples in double digit ipython cells (#9264) Update signature for _arrayfunction.__array__ (#9237) Add encode_cf_datetime benchmark (#9262) groupby, resample: Deprecate some positional args (#9236) Delete ``base`` and ``loffset`` parameters to resample (#9233) Update dropna docstring (#9257) Grouper, Resampler as public api (#8840) Fix mypy on main (#9252) fix fallback isdtype method (#9250) Enable pandas type checking (#9213) Per-variable specification of boolean parameters in open_dataset (#9218) test push Added a space to the documentation (#9247) Fix typing for test_plot.py (#9234) Allow mypy to run in vscode (#9239) Revert "Test main push" ...
Could use some input here. The reason #8909 snuck in is because it only manifests with I haven't tracked down why but it shows that we should be testing the backends individually, they should not depend on another backend being installed (at least not for the bundled engines). How can we do that nicely? |
If I understand the backends correctly, we could monkeypatch def test_backend_scipy(..., monkeypatch):
monkeypatch.setattr(xr.backends.plugins, "list_engines", partial(custom_list_engines, ScipyEntrypoint())) (I didn't verify this actually works, and whether there's a better way depends on the test, I guess) monkeypatching in general is something to be avoided usually, but in this case I don't think we can, unless we want to refactor the plugin machinery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get some return types on these methods? Thanks!
@andersy005 are you able to add some typing here? |
Shouldn't be a blocker. It's already better than before! But nice to have anyway :) |
absolutely! my time is quite limited today, but i intend to prioritize working on this tomorrow |
for more information, see https://pre-commit.ci
Could use some help with the typing here |
@@ -824,14 +822,14 @@ def test_create_mask_outer_indexer() -> None: | |||
def test_create_mask_vectorized_indexer() -> None: | |||
indexer = indexing.VectorizedIndexer((np.array([0, -1, 2]), np.array([0, 1, -1]))) | |||
expected = np.array([False, True, True]) | |||
actual = indexing.create_mask(indexer, (5,)) | |||
actual = indexing.create_mask(indexer, (5, 5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is right. THe indexer is a 2-element tuple, so the array's shape should be 2D
np.testing.assert_array_equal(expected, actual) | ||
|
||
indexer = indexing.VectorizedIndexer( | ||
(np.array([0, -1, 2]), slice(None), np.array([0, 1, -1])) | ||
) | ||
expected = np.array([[False, True, True]] * 2).T | ||
actual = indexing.create_mask(indexer, (5, 2)) | ||
actual = indexing.create_mask(indexer, (5, 2, 5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly here indexer is a 3-element tuple, so the shape should be 3D
@@ -695,141 +837,6 @@ def test_roundtrip_boolean_dtype(self) -> None: | |||
assert_identical(original, actual2) | |||
assert actual2["x"].dtype == "bool" | |||
|
|||
def test_orthogonal_indexing(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simply moved to a new mixin class.
pass | ||
|
||
|
||
# @indexing_tests(indexing_support=IndexingSupport.OUTER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will need a different _getitem
implementation for these.
return self[key] # type: ignore [index] | ||
|
||
|
||
class NewBackendArray(NdimSizeLenMixin, indexing.ExplicitlyIndexed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think its possible to do this without creating a new BackendArray
class for backends.
@@ -351,7 +353,147 @@ def test_dtype_coercion_error(self) -> None: | |||
ds.to_netcdf(path, format=format) | |||
|
|||
|
|||
class DatasetIOBase: | |||
class BackendIndexingTestsMixin: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simply moved out from DatasetIOBase. I can make a separate PR for this,
@@ -6489,3 +6496,83 @@ def test_zarr_safe_chunk_region(tmp_path): | |||
chunk = ds.isel(region) | |||
chunk = chunk.chunk() | |||
chunk.chunk().to_zarr(store, region=region) | |||
|
|||
|
|||
class LegacyBackendArrayWrapper(LegacyBackendArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what cfgrib's backend array looks like
@property | ||
def oindex(self) -> indexing.IndexCallable: | ||
return indexing.IndexCallable(self._oindex_get) | ||
|
||
@property | ||
def vindex(self) -> indexing.IndexCallable: | ||
return indexing.IndexCallable(self._vindex_get) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as reminder the goal is to use .oindex
and .vindex
to be explicit about which indexing to use instead of IndexingAdapter
subclasses.
I think this gets us nearly all the way to removing the special case for ExplicitlyIndexed
in as_compatible_data
return indexing.explicit_indexing_adapter( | ||
key, self.shape, indexing.IndexingSupport.OUTER_1VECTOR, self._getitem | ||
def _oindex_get(self, key: _OuterIndexerKey) -> Any: | ||
return indexing.outer_indexing_adapter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now we have three different adapters . This kind of dispatching was happening in explicit_indexing_adapter
using the IndexingAdapter
subclasses. Now it is more explicit
@@ -1014,35 +1138,60 @@ def explicit_indexing_adapter( | |||
------- | |||
Indexing result, in the form of a duck numpy-array. | |||
""" | |||
|
|||
# If the array is not an ExplicitlyIndexedNDArrayMixin, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's uncomment this in a future PR once we are sure nothing is broken
_BasicIndexerKey = tuple[int | np.integer | slice, ...] | ||
_OuterIndexerKey = tuple[ | ||
int | np.integer | slice | np.ndarray[Any, np.dtype[np.integer]], ... | ||
] | ||
_VectorizedIndexerKey = tuple[slice | np.ndarray[Any, np.dtype[np.integer]], ...] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure these types are fully accurate
* main: (79 commits) fix mean for datetime-like using the respective time resolution unit (#9977) Add `time_unit` argument to `CFTimeIndex.to_datetimeindex` (#9965) remove gate and add a test (#9958) Remove repetitive that (replace it with the) (#9994) add shxarray to the xarray ecosystem list (#9995) Add `shards` to `valid_encodings` to enable sharded Zarr writing (#9948) Use flox for grouped first, last (#9986) Bump the actions group with 2 updates (#9989) Fix some typing (#9988) Remove unnecessary a article (#9980) Fix test_doc_example on big-endian systems (#9949) fix weighted polyfit for arrays with more than 2 dimensions (#9974) Use zarr-fixture to prevent thread leakage errors (#9967) remove dask-expr from CI runs, fix related tests (#9971) Update time coding tests to assert exact equality (#9961) cast type to PDDatetimeUnitOptions (#9963) Suggest the correct name when no key matches in the dataset (#9943) fix upstream dev issues (#9953) Relax nanosecond datetime restriction in CF time decoding (#9618) Remove outdated quantile test. (#9945) ...
dffc05c
to
1ffe5e9
Compare
Continues the refactoring of backends and lazy indexing classes
.oindex
and.vindex
additions in_ElementwiseFunctionArray
,NativeEndiannessArray
, andBoolTypeArray
classes #8921whats-new.rst