-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing issues with for timestamp literals #8193
Conversation
@waitingkuo cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this item @comphead
It seems to me there is a potential problem with treating cast specially from other places with timestamps. I think it would be better if we treated all timestamps the same (either switch them to Timestamp::Seconds or Timestamp::Nanoseconds` for consisteny
Thanks @alamb I agree any kind of cast to timestamp is supposed to have consistent signature. Currently its not. |
not sure, why clippy is complaining.... locally its okay |
@alamb I have aligned signatures, the solution is still suboptimal(using 2 casts). In followup PR I would get rid of 2 casts and we need to support extreme low and high integer values, currently it overflows because Timestamp backed by I see 2 options to handle this:
@alamb let me know your thoughts. PS. Not sure why clippy failed... |
Timestamp(Nanosecond, None) Timestamp(Nanosecond, None) Timestamp(Nanosecond, None) | ||
|
||
|
||
# verify extreme values (expects default precision to be microsecond instead of nanoseconds. Work pending) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ive intentionally added this test to be uncommented once we figure out the overflow solution
@alamb please re review once you have some free time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @comphead -- I am worried about this PR potentially messing up downstream users of datafusion (e.g. IOx) and makes the timestamp type inconsistent within datafusion (I realize currently the timestamp type is inconsitent with what is done with postgres, etc).
If the goal is to be able to read data from Parquet that is stored as INT96 correctly as a timestamp, I am not sure why we are changing timetamp handling in DataFusion)
Perhaps a better approach would be to investigate updating the arrow-rs reader to read whatever spark generates as a proper timestamp type (@avantgardnerio hints at that in #7958 (comment)) 🤔
let expected = "Projection: person.id, person.first_name, person.last_name\ | ||
\n Filter: person.state = Utf8(\"CO\") AND person.age >= Int64(21) AND person.age <= Int64(65)\ | ||
\n TableScan: person"; | ||
let expected = "Projection: person.id, person.first_name, person.last_name\n Filter: person.state = Utf8(\"CO\") AND person.age >= Int64(21) AND person.age <= Int64(65)\n TableScan: person"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these tests actually different? They look the same to me except the formatting changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to revert these changes or reformat them if there is a difference, to make the diff clearer
Thanks @alamb this is my fault, I referenced too much problems and brought up the confusion. This particular PR has nothing to do with INT96 and Parquet, it relates to timestamp literals only. It fixes next problems
I have returned the signature to be NanoSecond like it was before, and now everything timestamp related in DF is consistent.
|
Seems reasonable to me -- thank you @comphead. Does this revert the behavior added to close #2979 (to align with postgres behavior?) Or does it just change to_timestamp to return a type of TimestampNanos (though the argument is still interpreted as -- @waitingkuo do you have a few moments to give this PR a look? |
It doesn't revert the behavior, the input still treated as seconds, but the return type is nanoseconds to keep DataFusion consistent with all timestamp arithmetics and functions. |
Ah, sorry for my confusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @comphead for bearing with me. I took another look and pass through this PR and it makes sense to me. Thank you for your focus on detail
I left some suggestions for improvements, but I don't think they are necessary prior to merging this pR
let expected = "Projection: person.id, person.first_name, person.last_name\ | ||
\n Filter: person.state = Utf8(\"CO\") AND person.age >= Int64(21) AND person.age <= Int64(65)\ | ||
\n TableScan: person"; | ||
let expected = "Projection: person.id, person.first_name, person.last_name\n Filter: person.state = Utf8(\"CO\") AND person.age >= Int64(21) AND person.age <= Int64(65)\n TableScan: person"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to revert these changes or reformat them if there is a difference, to make the diff clearer
Addressed all comments |
Should I merge this PR @alamb ? |
Yes, please do so! It is one of my goals for the year to grow the capacity for review / merging PRs (so as to avoid single-person bottlenecks, ahem...) As long as the guidelines described here are followed any committer should feel free to merge approved PRs: |
🎉 |
Which issue does this PR close?
Closes partially #7958
Rationale for this change
During #7958 investigation it was revealed that timestamp conversion has bunch of inconsistencies with PG and DF itself
What changes are included in this PR?
This PR is intended to fix:
to_timestamp
,::timestamp
,cast(n as timestamp)
should be consistentAre these changes tested?
Oh yes
Are there any user-facing changes?