Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python,rust): Add many more auto-inferable datetime formats for str.to_datetime #16634

Merged

Conversation

Julian-J-S
Copy link
Contributor

@Julian-J-S Julian-J-S commented May 31, 2024

fix #16115

also as @alexander-beedie mentioned (#16115 (comment)) I added support for many missing patterns so that all separators (-, ., /) have the same patterns.

❗ Needs Decision

There are currently 3 failing tests because they have datetime texts with fractional seconds that do not have 3/6/9 digits but something else like

  • 2018-09-05T04:24:01.9 -> 1 digit
  • 2018-09-05T04:24:02.11 -> 2 digits

usually fractional seconds should only be in milli, micro or nano seconds. Not sure if there are actually any formats out there apart from the tests?!?
Still, it would be possible to support that using the %.f specifier (little slower probably) instead of %.3f, %.6f, %.9f

Need Feedback if we want to support those (strange?) seconds fractions (-> will adjust patterns accordingly) or not (-> will adjust tests)

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels May 31, 2024
@MarcoGorelli
Copy link
Collaborator

thanks @JulianCologne

for the failing tests, is that because you've removed "%FT%H:%M:%S%.f"? tbh I'd be ok with just keeping that one in at the end of the list, no big deal

@Julian-J-S
Copy link
Contributor Author

thanks @JulianCologne

for the failing tests, is that because you've removed "%FT%H:%M:%S%.f"? tbh I'd be ok with just keeping that one in at the end of the list, no big deal

yes, I removed that on purpose because it makes the patterns unbalanced.
With all the different separators and formats we have like ~30 patterns but then only 1(!) should support different number of fractional digits? Feels very wrong and arbitrary to me

Format %FT%H:%M:%S%.f is just an abbreviation where %F=%Y-%m-%d

My suggestion would be either to not support this (have never seen this irl) or support any fractional digits for all patterns wich can be achieved by either

  • adding %.f (any number of digits) after all the %.3f, %.6f, %.9f patterns, or
  • use %.f (any number of digits) instead of all the %.3f, %.6f, %.9f patterns

@MarcoGorelli
Copy link
Collaborator

Feels very wrong

It's just an extra pattern :) For context, it comes from #3195 . %Y-%m-%dT%H:%M:%S%.f is the common ISO8601 format, IMO it's ok for that one to be a bit laxxer as a last-resort, so long as it's in the "other" section at the end of the formats (as it currently is)

Copy link

codecov bot commented Jun 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.53%. Comparing base (d7b4f72) to head (20fa8c8).
Report is 20 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16634      +/-   ##
==========================================
+ Coverage   81.51%   81.53%   +0.01%     
==========================================
  Files        1414     1414              
  Lines      185906   186398     +492     
  Branches     3027     3014      -13     
==========================================
+ Hits       151535   151972     +437     
- Misses      33840    33896      +56     
+ Partials      531      530       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Julian-J-S
Copy link
Contributor Author

@MarcoGorelli
my last commit added the missing pattern again and all checks are passing.
Ready from my side 😃

Copy link
Collaborator

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, open to expanding these, thanks @JulianCologne !

Comment on lines -344 to +347
01/01/21 00:00:00,31-01-2021T00:00:00.123,31-01-2021 11:00
01/01/21 00:15:00,31-01-2021T00:15:00.123,31-01-2021 01:00
01/01/21 00:30:00,31-01-2021T00:30:00.123,31-01-2021 01:15
01/01/21 00:45:00,31-01-2021T00:45:00.123,31-01-2021 01:30
01/01/2021 00:00:00,31-01-2021T00:00:00.123,31-01-2021 11:00
01/01/2021 00:15:00,31-01-2021T00:15:00.123,31-01-2021 01:00
01/01/2021 00:30:00,31-01-2021T00:30:00.123,31-01-2021 01:15
01/01/2021 00:45:00,31-01-2021T00:45:00.123,31-01-2021 01:30
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one would previously get parsed as the year 0021, which is almost surely not what anybody wanted...good call here

@ritchie46 ritchie46 merged commit d785903 into pola-rs:main Jun 4, 2024
26 checks passed
@Julian-J-S Julian-J-S deleted the feat-add-many-inferable-datetime-patterns branch June 4, 2024 09:38
@stinodego stinodego changed the title feat(python,rust): add many more auto inferable datetime formats feat(python,rust): Add many more auto-inferable datetime formats for str.to_datetime Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

str.to_datetime - should support date format "%Y.%m.%d" # 2023.12.31
3 participants