-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23360][SQL][PYTHON] Get local timezone from environment via pytz, or dateutil. #20559
Changes from 1 commit
e87bd76
bb0bb9d
939ac58
e20e9fd
5c5988d
a082e8c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1709,6 +1709,15 @@ def _check_dataframe_convert_date(pdf, schema): | |
return pdf | ||
|
||
|
||
def _get_local_timezone(): | ||
""" Get local timezone from environment vi pytz, or dateutil. """ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. did you mean "from environment via pytz"? the 'TZ' environment var is read by the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I modified and added the comment. Can you understand by the comment? |
||
from pyspark.sql.utils import require_minimum_pandas_version | ||
require_minimum_pandas_version() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need this here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, it isn't needed here. I'll remove it. Thanks. |
||
|
||
import os | ||
return os.environ.get('TZ', 'dateutil/:') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't really understand how does "dateutil/:" work, can you maybe add some comments for that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, I'll add some comments. |
||
|
||
|
||
def _check_dataframe_localize_timestamps(pdf, timezone): | ||
""" | ||
Convert timezone aware timestamps to timezone-naive in the specified timezone or local timezone | ||
|
@@ -1721,7 +1730,7 @@ def _check_dataframe_localize_timestamps(pdf, timezone): | |
require_minimum_pandas_version() | ||
|
||
from pandas.api.types import is_datetime64tz_dtype | ||
tz = timezone or 'tzlocal()' | ||
tz = timezone or _get_local_timezone() | ||
for column, series in pdf.iteritems(): | ||
# TODO: handle nested timestamps, such as ArrayType(TimestampType())? | ||
if is_datetime64tz_dtype(series.dtype): | ||
|
@@ -1744,7 +1753,7 @@ def _check_series_convert_timestamps_internal(s, timezone): | |
from pandas.api.types import is_datetime64_dtype, is_datetime64tz_dtype | ||
# TODO: handle nested timestamps, such as ArrayType(TimestampType())? | ||
if is_datetime64_dtype(s.dtype): | ||
tz = timezone or 'tzlocal()' | ||
tz = timezone or _get_local_timezone() | ||
return s.dt.tz_localize(tz).dt.tz_convert('UTC') | ||
elif is_datetime64tz_dtype(s.dtype): | ||
return s.dt.tz_convert('UTC') | ||
|
@@ -1766,15 +1775,13 @@ def _check_series_convert_timestamps_localize(s, from_timezone, to_timezone): | |
|
||
import pandas as pd | ||
from pandas.api.types import is_datetime64tz_dtype, is_datetime64_dtype | ||
from_tz = from_timezone or 'tzlocal()' | ||
to_tz = to_timezone or 'tzlocal()' | ||
from_tz = from_timezone or _get_local_timezone() | ||
to_tz = to_timezone or _get_local_timezone() | ||
# TODO: handle nested timestamps, such as ArrayType(TimestampType())? | ||
if is_datetime64tz_dtype(s.dtype): | ||
return s.dt.tz_convert(to_tz).dt.tz_localize(None) | ||
elif is_datetime64_dtype(s.dtype) and from_tz != to_tz: | ||
# `s.dt.tz_localize('tzlocal()')` doesn't work properly when including NaT. | ||
return s.apply(lambda ts: ts.tz_localize(from_tz).tz_convert(to_tz).tz_localize(None) | ||
if ts is not pd.NaT else pd.NaT) | ||
return s.dt.tz_localize(from_tz).dt.tz_convert(to_tz).dt.tz_localize(None) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ueshin, is it safe to remove https://github.com/pandas-dev/pandas/blob/0.19.x/pandas/tslib.pyx#L1760 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch! I'll revert this. |
||
else: | ||
return s | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering if changing these values made a difference somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert it.