-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preserve array type / timezone in date_bin
and date_trunc
functions
#7729
Conversation
The result type of date_bin and date_trunc never includes any timezone information. Change this such that the timezone of the resulting array from these functions is copied from the input array.
Co-authored-by: Alex Huang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mhilton -- this makes sense to me.
I wonder if we could add at least one end to end sql test in https://github.com/apache/arrow-datafusion/blob/main/datafusion/sqllogictest/test_files/timestamps.slt that shows the timezone being preserved (to something other than UTC)?
date_bin
and date_trunc
functions
let cases = vec![ | ||
( | ||
vec![ | ||
"2020-09-08T00:00:00Z", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the usecase is preseving timezones, it may make sense to also check with timezones other than Z
(like maybe +5 or +8 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test I had for the "-02" zone was sufficient. The way times are stored is always as a UTC offset, the zone information really just indicates how the offset needs to be interpretted. These strings are only used to derive the offset and therefore the specified zone is ignored when creating the array. However, more tests don't hurt so I've added the ones you suggested for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks -- sorry for my confusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @mhilton 👍
Add some additional tests suggested in code reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mhilton
let cases = vec![ | ||
( | ||
vec![ | ||
"2020-09-08T00:00:00Z", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks -- sorry for my confusion
(thank you for the review, BTW @Weijun-H 🙏 ) |
…ns (apache#7729) * preserve array type in date_bin and date_trunc functions The result type of date_bin and date_trunc never includes any timezone information. Change this such that the timezone of the resulting array from these functions is copied from the input array. * Update datafusion/expr/src/built_in_function.rs Co-authored-by: Alex Huang <[email protected]> * fix: syntax error * fix: datafusion-cli cargo update * review suggestions Add some additional tests suggested in code reviews. * fix formatting --------- Co-authored-by: Alex Huang <[email protected]>
Which issue does this PR close?
Closes #5976. N.B. this issue is already closed, but I don't think the issue was completely fixed.
Rationale for this change
Elliding timezone information in these functions makes the data harder to process downstream. It would have been sufficient for the purposes of not losing information to normalize all timezones to "+00" (UTC), however if the trouble has been taken to convert a time to a particular timezone it is friendlier to preserve that.
What changes are included in this PR?
The result type of date_bin and date_trunc never includes any timezone information. Change this such that the timezone of the resulting array from these functions is copied from the input array.
Are these changes tested?
Yes
Are there any user-facing changes?
I don't think this would be considered user facing.