-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(rust): Fix datetime cast behavior for pre-epoch times #19949
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #19949 +/- ##
==========================================
+ Coverage 79.45% 79.47% +0.01%
==========================================
Files 1555 1555
Lines 216168 216320 +152
Branches 2456 2456
==========================================
+ Hits 171760 171910 +150
- Misses 43850 43852 +2
Partials 558 558 ☔ View full report in Codecov by Sentry. |
thanks @ptiza - could you add a test please? the one from the linked issue is fine |
thanks - @orlp fancy taking a look? |
@MarcoGorelli It's not clear to me that flooring is better than truncating here. It just depends on which direction you want the error to happen. In your example you have 1969-01-01 plus 1 microsecond, and you find it surprising that it rounds to 1 millisecond. But someone else might have 1969-01-01 minus 1 microsecond and find it surprising that (after this PR) it takes off an entire millisecond. When casting durations I think truncating is really the only sensible solution, we don't want -1 nanoseconds to suddenly become a full negative millisecond. So I think to be consistent using truncating division everywhere makes the most sense. |
Thanks for your response!
Sure, but this is also what happens currently with 1971-01-01 minus 1 microsecond, right? I was thinking that from the user's perspective, the fact that dates are backed by int64 should just be an implementation detail, and that they should see rounding consistently happening in the same direction But maybe that's OK, and it's not unreasonable to expect users to know how datetimes are represented and to expect rounding towards 1970-01-01 UTC to happen Let's leave as-is then? Sorry @ptiza for having mislead you with my issue |
Shouldn't we have a test then to verify this behavior somehow? Because with this change implemented I still get all tests passing, so this duration scenario is not reflected in the tests. I find the current behavior inconsistent. >>> pl.Series([-1]).cast(pl.Datetime("us")).cast(pl.Datetime("ms"))
shape: (1,)
Series: '' [datetime[ms]]
[
1969-12-31 23:59:59.999
] >>> pl.Series([datetime(1969, 12, 31, 23, 59, 59, 999999)], dtype=pl.Datetime("ms"))
shape: (1,)
Series: '' [datetime[ms]]
[
1970-01-01 00:00:00
] But somehow they don't. With this PR in place, you get the same result for both operations (the first one). Edit: There is a test on the behavior of the first example: polars/py-polars/tests/unit/test_queries.py Lines 344 to 364 in 742c737
|
@MarcoGorelli Ok, I think you make a convincing point: an arbitrary change of behavior from 1971 to 1970 is undesirable, and a symptom of something being wrong. After thinking about it some more I think this policy makes sense:
With that in mind I would approve this PR, but I'll leave the final decision to you @MarcoGorelli. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks all for discussion
discussed with Ritchie / Stijn / Simon, they're on board with this too - shipping it then
Fixes #19309
See also my comment #19309 (comment)
Error was introduced with #14026
Pre-epoch timestamps are negative, if we trunc div instead of floor div here, we get an incorrect distance to epoch.