Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate use of strings denoting units with 'M', 'Y' or 'y' in pd.to_timedelta (36666) #36838

Merged
merged 32 commits into from
Oct 31, 2020

Conversation

avinashpancham
Copy link
Contributor

@avinashpancham
Copy link
Contributor Author

Do we only want to deprecate variations with a space (pd.to_timedelta("1 Y")) or also without a space (pd.to_timedelta("1Y"))

@avinashpancham avinashpancham changed the title DEPR: Deprecate use of strings denoting units with 'M', 'Y', 'm' or 'y' in pd.to_timedelta DEPR: Deprecate use of strings denoting units with 'M', 'Y', 'm' or 'y' in pd.to_timedelta (36666) Oct 3, 2020
raise ValueError(
"unit must not be specified if the input is/contains a str"
)
elif arg.upper().endswith(" M") or arg.upper().endswith(" Y"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in your other commented, we also want to catch arguments without the space as well e.g. "1Y".

Also we need a test for this warning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@avinashpancham
Copy link
Contributor Author

Ah I see some tests are failing. I will update the code.

@avinashpancham
Copy link
Contributor Author

All tests related to pd.to_timedelta are passing. The failed tests originate from another issue on master.

@avinashpancham
Copy link
Contributor Author

Updated my code and merged master. Everything passes now.

],
)
def test_unambiguous_timedelta_values(self, val, warning):
with tm.assert_produces_warning(warning):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you but the github issue number here as a comment?

Copy link
Contributor Author

@avinashpancham avinashpancham Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. CI fails due to other tests on master though. Will merge master later once that is solved.

if unit is not None:
raise ValueError(
"unit must not be specified if the input is/contains a str"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm we already have this check on L103, so something must be going wrong here.

Copy link
Contributor Author

@avinashpancham avinashpancham Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you do: pd.to_timedelta("1 Y") then unit is not specified and by default None. L103 checks only if unit is "Y", "M", 'y' or 'm' (e.g. pd.to_timedelta(1, unit="Y") ), and therefore you never reach the conditional code.

@@ -258,7 +258,9 @@ def test_unit_parser(self, units, np_unit, wrapper):
if unit == "M":
expected = Timedelta(np.timedelta64(2, "m").astype("timedelta64[ns]"))

result = to_timedelta(f"2{unit}")
warning = None if unit != "m" else FutureWarning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to parameterize this (e.g. elminate the for loop and parameterize on units)

then add a parameter whether should test for FutureWarning or not (could be bool is ok).

then in the body you can easily check this for both to_timedelta and Timedelta

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have approx 50 values that have no warning and just one ("m") that has a warning. IMO a parameter for this would be overkill. But if we want this then I can change it.

Copy link
Contributor Author

@avinashpancham avinashpancham Oct 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test. IMO test_unit_parser could be refactored even further: split in 2 tests: test_unit_parser_array and test_unit_parser_scalar and apply a parameter on the function that converts its to timedelta. Do we want to do this?

@jreback jreback added Deprecate Functionality to remove in pandas Timedelta Timedelta data type labels Oct 12, 2020
@avinashpancham
Copy link
Contributor Author

I found out that the following was also possible: pd.to_timedelta(["1Y"]) or pd.to_timedelta(pd.Series(["1Y"])). So I updated to_timedelta to also give depr warnings in this situation. I'm not completely happy with this added code, so any feedback is appreciated

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to do the warning inside the string to timedelta conversion code in timedelta.pyx itself;

@@ -101,6 +110,16 @@ def to_timedelta(arg, unit=None, errors="raise"):
"represent unambiguous timedelta values durations."
)

if isinstance(arg, (list, tuple, ABCSeries, ABCIndexClass, np.ndarray)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the warning shouldn't be here

@@ -1467,6 +1468,16 @@ cdef _broadcast_floordiv_td64(
return res


def check_unambiguous_timedelta_values(object[:] values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than this you can do it in the Timedelta string converter

"unit must not be specified if the input is/contains a str"
)
elif re.search(r"^\d+\s?[M|Y|m|y]$", arg):
warnings.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wouldn't do this here at all

@avinashpancham
Copy link
Contributor Author

Thanks @jreback , moved the warning to timedelta.pyx and the code is much cleaner now.

@avinashpancham
Copy link
Contributor Author

@jreback implemented all suggestions, could you have a 2nd look? CI is still failing after merging master, but that is related to other issues than introduced in this PR.

@avinashpancham
Copy link
Contributor Author

CI failed due to FATAL ERROR: Committing semi space failed. Allocation failed - process out of memory. Can someone please restart it? Thanks!

@jreback
Copy link
Contributor

jreback commented Oct 31, 2020

why is 'm' (lowercase) m deprecated? this is minutes as in not ambiguous. (agree on Y, y, M)

@avinashpancham
Copy link
Contributor Author

My bad, I thought we wanted to deprecate all variations of m (lowercase and uppercase). Will revert it then.

@jreback
Copy link
Contributor

jreback commented Oct 31, 2020

My bad, I thought we wanted to deprecate all variations of m (lowercase and uppercase). Will revert it then.

kk great, otherwise lgtm. ping on green.

@avinashpancham avinashpancham changed the title DEPR: Deprecate use of strings denoting units with 'M', 'Y', 'm' or 'y' in pd.to_timedelta (36666) DEPR: Deprecate use of strings denoting units with 'M', 'Y' or 'y' in pd.to_timedelta (36666) Oct 31, 2020
@avinashpancham
Copy link
Contributor Author

@jreback reverted the depr of lowercase m, merged master and CI on green.

@jreback jreback merged commit d08dca6 into pandas-dev:master Oct 31, 2020
@jreback
Copy link
Contributor

jreback commented Oct 31, 2020

thanks @avinashpancham very nice!

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
ukarroum pushed a commit to ukarroum/pandas that referenced this pull request Nov 2, 2020
@MarcoGorelli MarcoGorelli mentioned this pull request Nov 7, 2022
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DEPR deprecate pd.to_timedelta('1 M') and pd.to_timedelta('1 Y')
3 participants