Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preserve array type / timezone in date_bin and date_trunc functions #7729

Merged
merged 8 commits into from
Oct 4, 2023

Conversation

mhilton
Copy link
Contributor

@mhilton mhilton commented Oct 3, 2023

Which issue does this PR close?

Closes #5976. N.B. this issue is already closed, but I don't think the issue was completely fixed.

Rationale for this change

Elliding timezone information in these functions makes the data harder to process downstream. It would have been sufficient for the purposes of not losing information to normalize all timezones to "+00" (UTC), however if the trouble has been taken to convert a time to a particular timezone it is friendlier to preserve that.

What changes are included in this PR?

The result type of date_bin and date_trunc never includes any timezone information. Change this such that the timezone of the resulting array from these functions is copied from the input array.

Are these changes tested?

Yes

Are there any user-facing changes?

I don't think this would be considered user facing.

The result type of date_bin and date_trunc never includes any
timezone information. Change this such that the timezone of the
resulting array from these functions is copied from the input array.
@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions labels Oct 3, 2023
@mhilton mhilton requested a review from Weijun-H October 3, 2023 11:53
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mhilton -- this makes sense to me.

I wonder if we could add at least one end to end sql test in https://github.com/apache/arrow-datafusion/blob/main/datafusion/sqllogictest/test_files/timestamps.slt that shows the timezone being preserved (to something other than UTC)?

@alamb alamb changed the title preserve array type in date_bin and date_trunc functions preserve array type / timezone in date_bin and date_trunc functions Oct 3, 2023
let cases = vec![
(
vec![
"2020-09-08T00:00:00Z",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the usecase is preseving timezones, it may make sense to also check with timezones other than Z (like maybe +5 or +8 🤔

Copy link
Contributor Author

@mhilton mhilton Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test I had for the "-02" zone was sufficient. The way times are stored is always as a UTC offset, the zone information really just indicates how the offset needs to be interpretted. These strings are only used to derive the offset and therefore the specified zone is ignored when creating the array. However, more tests don't hurt so I've added the ones you suggested for clarity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- sorry for my confusion

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @mhilton 👍

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Oct 4, 2023
@mhilton mhilton requested a review from alamb October 4, 2023 08:36
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mhilton

let cases = vec![
(
vec![
"2020-09-08T00:00:00Z",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- sorry for my confusion

@alamb alamb merged commit 018ffbe into apache:main Oct 4, 2023
@alamb
Copy link
Contributor

alamb commented Oct 4, 2023

(thank you for the review, BTW @Weijun-H 🙏 )

Ted-Jiang pushed a commit to Ted-Jiang/arrow-datafusion that referenced this pull request Oct 7, 2023
…ns (apache#7729)

* preserve array type in date_bin and date_trunc functions

The result type of date_bin and date_trunc never includes any
timezone information. Change this such that the timezone of the
resulting array from these functions is copied from the input array.

* Update datafusion/expr/src/built_in_function.rs

Co-authored-by: Alex Huang <[email protected]>

* fix: syntax error

* fix: datafusion-cli cargo update

* review suggestions

Add some additional tests suggested in code reviews.

* fix formatting

---------

Co-authored-by: Alex Huang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

date_trunc drops timezone
3 participants