Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: switch Dataset and DataArray to use explicit indexes #2639

Merged
merged 4 commits into from
Jan 4, 2019

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Jan 1, 2019

xref #1603

This change switches Dataset.indexes and DataArray.indexes to be backed by
explicit dictionaries of indexes, instead of being implicitly defined by
the set of coordinates with names matching dimensions.

There are no changes to the public interface yet: these will come later.

My current plan:

  1. (This PR) Indexes are recreated from coordinates every time a new DataArray
    or Dataset is created.
  2. (Follow-up PRs) Refactor indexes to be propagated explicitly in xarray operations. This will facilitate future API changes, when indexes will no longer only be associated with dimensions. I will probably add some testing decorator that can be used to mark part of a test as including no creation of default indexes.
  3. Add explicit entries into indexes for MultiIndex levels that are checked instead of MultiIndex variables. Still no public API changes (aside from adding more entries to .indexes).
  4. Support arbitrary coordinates in indexes.

This change switches Dataset.indexes and DataArray.indexes to be backed by
explicit dictionaries of indexes, instead of being implicitly defined by
the set of coordinates with names matching dimensions.

There are no changes to the public interface yet: these will come later.

For now, indexes are recreated from coordinates every time a new DataArray
or Dataset is created. In follow-up PRs, I will refactor indexes to be
propagated explicitly in xarray operations. This will facilitate future API
changes, when indexes will no longer only be associated with dimensions.
@fujiisoup
Copy link
Member

@shoyer

It looks nice and reasonable to me.

(This PR) Indexes are recreated from coordinates every time a new DataArray or Dataset is created.

Does it add some overhead (though I don't see any reason of it)?

Mapping[Any, pandas.Index] mapping indexing keys (levels/dimension names)
to indexes used for indexing along that dimension.
"""
return {key: coords[key].to_index() for key in dims if key in coords}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use OrderedDict instead of dict to preserve the order?
It is probably necessary for the backward compatibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this makes sense, for now. Eventually we are going to break this, though (when we add new entries).

@shoyer
Copy link
Member Author

shoyer commented Jan 3, 2019

(This PR) Indexes are recreated from coordinates every time a new DataArray or Dataset is created.

Does it add some overhead (though I don't see any reason of it)?

I suppose another option would be to leave self._indexes = None in the constructor, and only set default values for self._indexes when the indexes property is accessed.

I think we do something like for a few other attributes already.

Copy link
Member

@fujiisoup fujiisoup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Looking foward to seeing the new structure in xarray :)

@shoyer shoyer merged commit 06244df into pydata:master Jan 4, 2019
@shoyer shoyer deleted the explicit-indexes branch January 4, 2019 21:07
dcherian pushed a commit to yohai/xarray that referenced this pull request Jan 24, 2019
* master:
  Remove broken Travis-CI builds (pydata#2661)
  Type checking with mypy (pydata#2655)
  Added Coarsen (pydata#2612)
  Improve test for GH 2649 (pydata#2654)
  revise top-level package description (pydata#2430)
  Convert ref_date to UTC in encode_cf_datetime (pydata#2651)
  Change an `==` to an `is`. Fix tests so that this won't happen again. (pydata#2648)
  ENH: switch Dataset and DataArray to use explicit indexes (pydata#2639)
  Use pycodestyle for lint checks. (pydata#2642)
  Switch whats-new for 0.11.2 -> 0.11.3
  DOC: document v0.11.2 release
  Use built-in interp for interpolation with resample (pydata#2640)
  BUG: pytest-runner no required for setup.py (pydata#2643)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants