Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delegate (most) datetimelike Series arithmetic ops to DatetimeIndex #18817

Closed
wants to merge 10 commits into from

Conversation

jbrockmendel
Copy link
Member

DatetimeIndex arithmetic ops do overflow checks that Series does not. Instead of re-implementing those checks (which I tried and its a PITA) this PR just delegates the appropriate operations to the DatetimeIndex implementation.

There are a couple of things that are missing from the DatetimeIndex implementations, e.g. the int-array case added to indexes.datetimelike. If there's consensus that having a single implementation for these ops is the way to go, I'll port the other cases over in their own PRs.

If/when these implementations get merged, ops._TimeOp can be trimmed/removed.

@pep8speaks
Copy link

pep8speaks commented Dec 18, 2017

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 19, 2017 at 16:11 Hours UTC

@codecov
Copy link

codecov bot commented Dec 18, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@07d8c2d). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #18817   +/-   ##
=========================================
  Coverage          ?    91.6%           
=========================================
  Files             ?      154           
  Lines             ?    51452           
  Branches          ?        0           
=========================================
  Hits              ?    47135           
  Misses            ?     4317           
  Partials          ?        0
Flag Coverage Δ
#multiple 89.47% <100%> (?)
#single 40.8% <61.29%> (?)
Impacted Files Coverage Δ
pandas/core/ops.py 90.56% <100%> (ø)
pandas/core/indexes/datetimelike.py 97.16% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 07d8c2d...ab75018. Read the comment docs.

@@ -107,7 +107,7 @@ def test_shift(self):
# incompat tz
s2 = Series(date_range('2000-01-01 09:00:00', periods=5,
tz='CET'), name='foo')
pytest.raises(ValueError, lambda: s - s2)
pytest.raises(TypeError, lambda: s - s2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an API change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in needs to be noted in whatsnew or needs to be avoided?

return all(isinstance(x, ABCDateOffset) for x in arr_or_obj)
return False

def _is_offset(arr_or_obj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to move this to pandas.core.dtypes.common and de-privatize

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and add tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on all three points.

@@ -664,6 +665,12 @@ def __add__(self, other):
return self.shift(other)
elif isinstance(other, (Index, datetime, np.datetime64)):
return self._add_datelike(other)
elif (isinstance(self, DatetimeIndex) and
isinstance(other, np.ndarray) and other.size == 1 and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh? I believe we already check is_integer_dtype(other). this check is much too specific

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the status quo adding np.array(1, dtype=np.int64) to a DatetimeIndex behaves like adding Timedelta(nanosecond=1). Series raises (and a Series test fails unless this is caught here).

I'm open to ideas for how to make this less hacky.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to -- for the time being -- check for this case in the Series op and avoid dispatching to DatetimeIndex.

@@ -715,6 +726,29 @@ def wrapper(left, right, name=name, na_op=na_op):
if isinstance(right, ABCDataFrame):
return NotImplemented

if _is_offset(right):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic does not belong here. this is what happens in Timeop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic does not belong here. this is what happens in Timeop.

Yes. This is preventing this case from being dispatched to DatetimeIndex so that it continues to be handled by TimeOp for now. DatetimeIndex makes a mess of this case (try it...), but I decided to fix that separately.

@jbrockmendel jbrockmendel mentioned this pull request Dec 18, 2017
4 tasks
@gfyoung gfyoung added Docs Timedelta Timedelta data type labels Dec 18, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs some work. let's clear the decks and you can revist

@@ -198,6 +198,7 @@ Other API Changes
- Rearranged the order of keyword arguments in :func:`read_excel()` to align with :func:`read_csv()` (:issue:`16672`)
- :func:`pandas.merge` now raises a ``ValueError`` when trying to merge on incompatible data types (:issue:`9780`)
- :func:`wide_to_long` previously kept numeric-like suffixes as ``object`` dtype. Now they are cast to numeric if possible (:issue:`17627`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.23.0

@jbrockmendel
Copy link
Member Author

The other arithmetic PRs are a couple steps away from really being ready for this. The only part of this that is separately actionable (and on which I need input) is for where (and I guess if) to fix the mismatched behavior in DatetimeIndex.__add__ and __sub__

rng = pd.date_range('2000-01-01 09:00', freq='H', periods=2, tz='US/Pacific')
ser = pd.Series(rng)
other = np.array(1, dtype=np.int64)

>>> ser + other
[...]
TypeError: incompatible type for a datetime/timedelta operation [__add__]

>>> rng + other
DatetimeIndex(['2000-01-01 17:00:00.000000001-08:00', '2000-01-01 18:00:00.000000001-08:00'], dtype='datetime64[ns, US/Pacific]', freq=None)

>>> other + rng
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'DatetimeIndex'

This PR addressed that here and here, but got some pushback because it was (admittedly) ugly. Alternative suggestions requested.

Closing this PR, will (eventually) open another one with reduced scope.

@jbrockmendel jbrockmendel deleted the fix12534 branch February 11, 2018 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Negative timedeltas for deltas > 292 years
4 participants