ENH: switch Dataset and DataArray to use explicit indexes #2639

shoyer · 2019-01-01T00:48:42Z

This change switches Dataset.indexes and DataArray.indexes to be backed by
explicit dictionaries of indexes, instead of being implicitly defined by
the set of coordinates with names matching dimensions.

There are no changes to the public interface yet: these will come later.

My current plan:

(This PR) Indexes are recreated from coordinates every time a new DataArray
or Dataset is created.
(Follow-up PRs) Refactor indexes to be propagated explicitly in xarray operations. This will facilitate future API changes, when indexes will no longer only be associated with dimensions. I will probably add some testing decorator that can be used to mark part of a test as including no creation of default indexes.
Add explicit entries into indexes for MultiIndex levels that are checked instead of MultiIndex variables. Still no public API changes (aside from adding more entries to .indexes).
Support arbitrary coordinates in indexes.

This change switches Dataset.indexes and DataArray.indexes to be backed by explicit dictionaries of indexes, instead of being implicitly defined by the set of coordinates with names matching dimensions. There are no changes to the public interface yet: these will come later. For now, indexes are recreated from coordinates every time a new DataArray or Dataset is created. In follow-up PRs, I will refactor indexes to be propagated explicitly in xarray operations. This will facilitate future API changes, when indexes will no longer only be associated with dimensions.

fujiisoup · 2019-01-03T20:46:53Z

@shoyer

It looks nice and reasonable to me.

(This PR) Indexes are recreated from coordinates every time a new DataArray or Dataset is created.

Does it add some overhead (though I don't see any reason of it)?

fujiisoup · 2019-01-03T20:52:34Z

xarray/core/indexes.py

+    Mapping[Any, pandas.Index] mapping indexing keys (levels/dimension names)
+    to indexes used for indexing along that dimension.
+    """
+    return {key: coords[key].to_index() for key in dims if key in coords}


Should we use OrderedDict instead of dict to preserve the order?
It is probably necessary for the backward compatibility.

I guess this makes sense, for now. Eventually we are going to break this, though (when we add new entries).

shoyer · 2019-01-03T21:44:21Z

(This PR) Indexes are recreated from coordinates every time a new DataArray or Dataset is created.

Does it add some overhead (though I don't see any reason of it)?

I suppose another option would be to leave self._indexes = None in the constructor, and only set default values for self._indexes when the indexes property is accessed.

I think we do something like for a few other attributes already.

fujiisoup

Looks good to me. Looking foward to seeing the new structure in xarray :)

* master: Remove broken Travis-CI builds (pydata#2661) Type checking with mypy (pydata#2655) Added Coarsen (pydata#2612) Improve test for GH 2649 (pydata#2654) revise top-level package description (pydata#2430) Convert ref_date to UTC in encode_cf_datetime (pydata#2651) Change an `==` to an `is`. Fix tests so that this won't happen again. (pydata#2648) ENH: switch Dataset and DataArray to use explicit indexes (pydata#2639) Use pycodestyle for lint checks. (pydata#2642) Switch whats-new for 0.11.2 -> 0.11.3 DOC: document v0.11.2 release Use built-in interp for interpolation with resample (pydata#2640) BUG: pytest-runner no required for setup.py (pydata#2643)

shoyer mentioned this pull request Jan 1, 2019

Explicit indexes in xarray's data-model (Future of MultiIndex) #1603

Closed

shoyer added 2 commits December 31, 2018 16:55

Add xarray.core.indexes

2cd4960

Merge branch 'master' of github.com:pydata/xarray into explicit-indexes

31bc6f5

fujiisoup reviewed Jan 3, 2019

View reviewed changes

Fixes per review

9321e5a

fujiisoup approved these changes Jan 4, 2019

View reviewed changes

shoyer merged commit 06244df into pydata:master Jan 4, 2019

shoyer deleted the explicit-indexes branch January 4, 2019 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: switch Dataset and DataArray to use explicit indexes #2639

ENH: switch Dataset and DataArray to use explicit indexes #2639

shoyer commented Jan 1, 2019 •

edited

Loading

fujiisoup commented Jan 3, 2019

fujiisoup Jan 3, 2019

shoyer Jan 3, 2019

shoyer commented Jan 3, 2019

fujiisoup left a comment

ENH: switch Dataset and DataArray to use explicit indexes #2639

ENH: switch Dataset and DataArray to use explicit indexes #2639

Conversation

shoyer commented Jan 1, 2019 • edited Loading

fujiisoup commented Jan 3, 2019

fujiisoup Jan 3, 2019

Choose a reason for hiding this comment

shoyer Jan 3, 2019

Choose a reason for hiding this comment

shoyer commented Jan 3, 2019

fujiisoup left a comment

Choose a reason for hiding this comment

shoyer commented Jan 1, 2019 •

edited

Loading