-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support parsing timezone-aware datetimes in constructors when data type is also timezone-aware #16297
Comments
thanks for the ping so, for
The constructor should probably not be too different. Let's see:
For the last one, I think you're suggesting to convert to the given time zone. So long as it's clearly documented, and it's done for both the Series constructor and |
I've taken another look at PyArrow, and there is something else probably worth mirroring For In [8]: pc.strptime(pa.array(['2020-01-01T01:02:03+01:00']), unit='us', format='%Y-%m-%dT%H:%M:%S%z').type
Out[8]: TimestampType(timestamp[us, tz=UTC])
In [11]: pl.Series(['2020-01-01T01:02:03+01:00']).str.to_datetime(time_unit='us').dtype
Out[11]: Datetime(time_unit='us', time_zone='UTC') But, for in the constuctor, when starting from a tz-aware stdlib In [14]: pa.array([datetime(2020, 1, 1, tzinfo=timezone(timedelta(hours=1))), datetime(2020, 1, 2)]).type
Out[14]: TimestampType(timestamp[us, tz=+01:00])
In [15]: pl.Series([datetime(2020, 1, 1, tzinfo=timezone(timedelta(hours=1))), datetime(2020, 1, 2)]).dtype
<ipython-input-15-9332dc369f5e>:1: TimeZoneAwareConstructorWarning: Constructing a Series with time-zone-aware datetimes results in a Series with UTC time zone. To silence this warning, you can filter warnings of class TimeZoneAwareConstructorWarning, or set 'UTC' as the time zone of your datatype.
pl.Series([datetime(2020, 1, 1, tzinfo=timezone(timedelta(hours=1))), datetime(2020, 1, 2)]).dtype
Out[15]: Datetime(time_unit='us', time_zone='UTC') whereas Polars still converts to UTC One suggestion could be:
There's a further difference though. If the user specifies the time zone as part of the dtype, then Polars sets that as the dtype, whereas PyArrow converts as if starting from UTC: In [25]: pa.array([datetime(2020, 1, 1), datetime(2020, 1, 2)], type=pa.timestamp('us', 'Iran'))
Out[25]:
<pyarrow.lib.TimestampArray object at 0x7f2d3b4e5c60>
[
2020-01-01 00:00:00.000000Z,
2020-01-02 00:00:00.000000Z
]
In [26]: pl.Series([datetime(2020, 1, 1), datetime(2020, 1, 2)], dtype=pl.Datetime('us', 'Iran'))
Out[26]:
shape: (2,)
Series: '' [datetime[μs, Iran]]
[
2020-01-01 00:00:00 +0330
2020-01-02 00:00:00 +0330
] It looks like their rule is:
Where does this leave Polars? Not totally sure, just wanted to leave these findings here for now Something which currently doesn't look great (and is unintuitive?) is this: In [22]: pl.Series([datetime(2020, 1, 1), datetime(2020, 1, 1, tzinfo=ZoneInfo('Asia/Kathmandu'))], dtype=pl.Datetime('us', 'Europe/Amsterdam'))
Out[22]:
shape: (2,)
Series: '' [datetime[μs, Europe/Amsterdam]]
[
2020-01-01 00:00:00 CET
2019-12-31 18:15:00 CET
] The second element gets converted to OK, got a concrete proposal in #16828. It addresses several inconsistencies, but in doing so is unfortunately breaking for some people. In those cases, however, a clear warning is issued, advising the user about what to do instead |
Description
I ran into this today, and I think we can improve behavior here.
Consider this code:
This is odd. If we're casting timezone-aware data anyway, might as well cast it to desired time zone, right?
One of the benefits of doing this is that timezone-aware data can then be roundtripped, like in one of our tests:
For reference, PyArrow seems to handle this a bit differently from us and they do support this:
I may be missing something here, but I thought I'd throw this out there. Let's see what @MarcoGorelli thinks 😄
The text was updated successfully, but these errors were encountered: