Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rust): Fix datetime cast behavior for pre-epoch times #19949

Merged
merged 3 commits into from
Nov 29, 2024

Conversation

ptiza
Copy link
Contributor

@ptiza ptiza commented Nov 23, 2024

Fixes #19309

See also my comment #19309 (comment)

Error was introduced with #14026

Pre-epoch timestamps are negative, if we trunc div instead of floor div here, we get an incorrect distance to epoch.

@github-actions github-actions bot added fix Bug fix rust Related to Rust Polars labels Nov 23, 2024
Copy link

codecov bot commented Nov 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.47%. Comparing base (8facee9) to head (0094cc4).
Report is 42 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #19949      +/-   ##
==========================================
+ Coverage   79.45%   79.47%   +0.01%     
==========================================
  Files        1555     1555              
  Lines      216168   216320     +152     
  Branches     2456     2456              
==========================================
+ Hits       171760   171910     +150     
- Misses      43850    43852       +2     
  Partials      558      558              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MarcoGorelli
Copy link
Collaborator

thanks @ptiza - could you add a test please? the one from the linked issue is fine

@MarcoGorelli
Copy link
Collaborator

MarcoGorelli commented Nov 24, 2024

thanks - @orlp fancy taking a look?

@orlp
Copy link
Collaborator

orlp commented Nov 25, 2024

@MarcoGorelli It's not clear to me that flooring is better than truncating here. It just depends on which direction you want the error to happen. In your example you have 1969-01-01 plus 1 microsecond, and you find it surprising that it rounds to 1 millisecond. But someone else might have 1969-01-01 minus 1 microsecond and find it surprising that (after this PR) it takes off an entire millisecond.

When casting durations I think truncating is really the only sensible solution, we don't want -1 nanoseconds to suddenly become a full negative millisecond. So I think to be consistent using truncating division everywhere makes the most sense.

@MarcoGorelli
Copy link
Collaborator

Thanks for your response!

But someone else might have 1969-01-01 minus 1 microsecond and find it surprising that (after this PR) it takes off an entire millisecond.

Sure, but this is also what happens currently with 1971-01-01 minus 1 microsecond, right?

I was thinking that from the user's perspective, the fact that dates are backed by int64 should just be an implementation detail, and that they should see rounding consistently happening in the same direction

But maybe that's OK, and it's not unreasonable to expect users to know how datetimes are represented and to expect rounding towards 1970-01-01 UTC to happen

Let's leave as-is then? Sorry @ptiza for having mislead you with my issue

@ptiza
Copy link
Contributor Author

ptiza commented Nov 25, 2024

Shouldn't we have a test then to verify this behavior somehow?

Because with this change implemented I still get all tests passing, so this duration scenario is not reflected in the tests.

I find the current behavior inconsistent.
To me, these operations should give the same results:

>>> pl.Series([-1]).cast(pl.Datetime("us")).cast(pl.Datetime("ms"))
shape: (1,)
Series: '' [datetime[ms]]
[
        1969-12-31 23:59:59.999
]
>>> pl.Series([datetime(1969, 12, 31, 23, 59, 59, 999999)], dtype=pl.Datetime("ms"))
shape: (1,)
Series: '' [datetime[ms]]
[
        1970-01-01 00:00:00
]

But somehow they don't.

With this PR in place, you get the same result for both operations (the first one).

Edit: There is a test on the behavior of the first example:

def test_temporal_downcasts() -> None:
s = pl.Series([-1, 0, 1]).cast(pl.Datetime("us"))
assert s.to_list() == [
datetime(1969, 12, 31, 23, 59, 59, 999999),
datetime(1970, 1, 1),
datetime(1970, 1, 1, 0, 0, 0, 1),
]
# downcast (from us to ms, or from datetime to date) should NOT change the date
for s_dt in (s.dt.date(), s.cast(pl.Date)):
assert s_dt.to_list() == [
date(1969, 12, 31),
date(1970, 1, 1),
date(1970, 1, 1),
]
assert s.cast(pl.Datetime("ms")).to_list() == [
datetime(1969, 12, 31, 23, 59, 59, 999000),
datetime(1970, 1, 1),
datetime(1970, 1, 1),
]

@orlp
Copy link
Collaborator

orlp commented Nov 26, 2024

@MarcoGorelli Ok, I think you make a convincing point: an arbitrary change of behavior from 1971 to 1970 is undesirable, and a symptom of something being wrong.

After thinking about it some more I think this policy makes sense:

  • truncating division when casting signed durations (which are signed quantities inherently), and
  • flooring division when casting absolute times (which just happens to be a signed integer due arbitrarily setting an epoch in the middle of the representable range, but aren't inherently signed).

With that in mind I would approve this PR, but I'll leave the final decision to you @MarcoGorelli.

Copy link
Collaborator

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks all for discussion

discussed with Ritchie / Stijn / Simon, they're on board with this too - shipping it then

@MarcoGorelli MarcoGorelli merged commit 09f2a31 into pola-rs:main Nov 29, 2024
25 checks passed
@ptiza ptiza deleted the fixdev branch November 29, 2024 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix rust Related to Rust Polars
Projects
None yet
3 participants