Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python!): Consistently convert to given time zone in Series constructor #16828

Merged
merged 5 commits into from
Jun 11, 2024

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented Jun 9, 2024

Closes #16823
Closes #16297

#16823 (comment) is kinda sending me into an existential crisis. Any decision is going to break code for at least some people. All I can really say is that I wasn't aware of all these different constructors going down different paths when I worked on this.

This PR is one possible solution, and it builds upon #16742

Pros: this addresses quite a few inconsistencies in the constructor logic. With this change, constructing a Series or DataFrame with Python stdlib date or datetime objects will always convert (not replace) the time zone to the one of the given dtype. Naive objects get converted as if starting from UTC. This also addresses the issue brought up in #16714 (cc @stinodego )

Cons: this is a breaking change for anything constructing a Series with a tz-aware dtype and tz-naive datetimes. To try to mitigate damage, a warning is always raised in such cases, with a helpful message indicating what people should do instead, depending on their desired outcome.


Example

Before:

>>> from datetime import datetime
>>> pl.Series([datetime(2020, 1, 1)], dtype=pl.Datetime('us', 'Europe/Amsterdam'))
shape: (1,)
Series: '' [datetime[μs, Europe/Amsterdam]]
[
        2020-01-01 00:00:00 CET
]

After:

>>> from datetime import datetime
>>> pl.Series([datetime(2020, 1, 1)], dtype=pl.Datetime('us', 'Europe/Amsterdam'))
shape: (1,)
Series: '' [datetime[μs, Europe/Amsterdam]]
[
        2020-01-01 01:00:00 CET
]

Copy link

codecov bot commented Jun 9, 2024

Codecov Report

Attention: Patch coverage is 76.47059% with 4 lines in your changes missing coverage. Please review.

Project coverage is 81.35%. Comparing base (eaedf74) to head (5cb989b).
Report is 2 commits behind head on main.

Current head 5cb989b differs from pull request most recent head 5829286

Please upload reports for the commit 5829286 to get more accurate results.

Files Patch % Lines
py-polars/polars/_utils/construction/series.py 66.66% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16828      +/-   ##
==========================================
- Coverage   81.38%   81.35%   -0.03%     
==========================================
  Files        1425     1425              
  Lines      187665   187616      -49     
  Branches     2702     2700       -2     
==========================================
- Hits       152725   152638      -87     
- Misses      34444    34482      +38     
  Partials      496      496              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46
Copy link
Member

To try to mitigate damage, a warning is always raised in such cases

Can't we add this to the changelog and 1.0 blog? There are more things that change, and raising a warning for every one of them would do more harm (in annoyance) than good I think. I don't think we should warn if behavior changes, but instead of someone does something it shouldn't.

@MarcoGorelli
Copy link
Collaborator Author

True...there's already a warning though, it's been there for some time

if values_tz != "UTC" and dtype_tz is None:
warnings.warn(
"Constructing a Series with time-zone-aware "
"datetimes results in a Series with UTC time zone. "
"To silence this warning, you can filter "
"warnings of class TimeZoneAwareConstructorWarning, or "
"set 'UTC' as the time zone of your datatype.",
TimeZoneAwareConstructorWarning,
stacklevel=find_stacklevel(),
)

It's true that this can be annoying, but I'm also worried about breaking people's code. Other 1.0 changes are louder:

  • if you use melt instead of pivot, you get a loud error
  • if you use join(how='left'), you now get an extra column. It that breaks your expected schema, you'll get a loud error

But here, it's a subtle and silent behaviour change, that can be hard to detect. I'd appreciate having a warning in, at least for a little while (say, until 2.0). As much as I like the upgrade guides, I guess that most users ignore them.

Up to you of course, just voicing my suggestion, happy to remove it if you agree with the change and prefer to not have the warning

@MarcoGorelli MarcoGorelli force-pushed the init-datetimes branch 2 times, most recently from 0858969 to df380bd Compare June 9, 2024 13:58
@MarcoGorelli MarcoGorelli changed the title WIP feat(python!): consistently convert to given time zone in Series constructor feat(python!): consistently convert to given time zone in Series constructor Jun 9, 2024
@github-actions github-actions bot added breaking python Change that breaks backwards compatibility for the Python package enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Jun 9, 2024
@ritchie46
Copy link
Member

What I don't like about it is that new users or new Polars projects also get this warning and then it is a warning that warns that Polars works the way it works. Which, I think is strange.

Can't we add a "migration mode"? Where we add this to the pl.Context and if that is set, we warn. Then we can just advice existing code bases to set Polars into migration mode and check for all warnings. This allows to be a bit more liberal with warnings whereas new users wouldn't be annoyed.

@stinodego
Copy link
Contributor

What I don't like about it is that new users or new Polars projects also get this warning and then it is a warning that warns that Polars works the way it works. Which, I think is strange.

I also don't support adding warnings to happy paths. Especially if it's because of legacy reasons from pre-1.0.0.

In the future, we could address changes like this over a longer period where we introduce a deprecation warning first. But this is close to be considered a bug (since it's inconsistent across constructors). And this is the 1.0.0 release, so right now is our chance to wipe the slate clean.

I will add a section to the migration guide for "sneaky changes that might break your code", and add this one there. I think that'll have to do for this case.

Can't we add a "migration mode"? Where we add this to the pl.Context and if that is set, we warn. Then we can just advice existing code bases to set Polars into migration mode and check for all warnings. This allows to be a bit more liberal with warnings whereas new users wouldn't be annoyed.

We can do that - we already have something similar with the unstable functionality warnings. But I don't think it's really warranted in this case.

@MarcoGorelli MarcoGorelli marked this pull request as ready for review June 10, 2024 17:31
@MarcoGorelli
Copy link
Collaborator Author

MarcoGorelli commented Jun 10, 2024

OK, thanks for your reviews 🙏 I've removed the warning then, a Config.set_migration_mode (something like that) could always be considered separately

EDIT: just touching up some minor things

@MarcoGorelli MarcoGorelli marked this pull request as draft June 10, 2024 17:40
@MarcoGorelli MarcoGorelli marked this pull request as ready for review June 10, 2024 18:56
@ritchie46 ritchie46 merged commit 4c80f22 into pola-rs:main Jun 11, 2024
14 checks passed
@stinodego stinodego changed the title feat(python!): consistently convert to given time zone in Series constructor feat(python!): Consistently convert to given time zone in Series constructor Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking python Change that breaks backwards compatibility for the Python package enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
3 participants