-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Special case Duration display in datafusion-cli and sqllogictest #7070
Comments
@waitingkuo / @liukun4515 / @ozankabak I wonder if you have any thoughts about this proposal (related to reducing the impact of #7068) |
I will discuss with my team and share our thoughts tomorrow. |
I think the way you suggested seems to be the most rational option. Since we do not use year and month units for durations, we can maybe ignore those parts while formatting. |
I believe @tustvold plans to handle this next week |
Thinking about this a bit more, I think it would be cleanest if we added support for this to FormatOptions, this can then be plumbed through CastOptions. apache/arrow-rs#4581 adds the necessary upstream changes |
Closed by #6832 |
Is your feature request related to a problem or challenge?
In #6832 and #7068 , @tustvold proposes to use
Duration
consistently for timestamp arithmetic which makes the code simpler and consistent across DataFusionHowever, one implication of this change, at the time of this writigng, is shown #6832 (comment)
Specifically, in DataFusion 28.0.0 the output of
timestamp
-timestamp
is a interval and is displayed like this:However, without any additional changes after #6832 it is displayed in ISO 8601 duration format
I think this is problematic because the output:
select now() + interval '1 year';
)P198DT72932.972880S
), is not widely used for displaying intervals to humansCasting logic difference will means that the following query in 28.0.0
Would produce this in 29.0.0:
Describe the solution you'd like
I propose adding specific code for formatting DataFusion output, used in datafusion-cli, tests, and sqllogictests that formats Durations consistently with how
Interval
s are displayed in 28.0.0.This change will unblock the arrow upgrade in #6832 as well as give us a single location to control formatting in DataFusion making it easier to improve the formatting of intervals / durations (or other types) in the future if we desire.
Anyone using DataFusion programmatically can either choose to use this new formatting routine, or can format Durations / Intervals to their specific need using arrow-rs or custom code (as we do in IOx, for example (source))
While this solution still suffers from the
Duration
-->String
casting problem, I think it is ok because I don't think it is very common and there is a workaround (to cast the duration to an interval):For those people who want even more control of duration printing, it would probably make sense to add a specific
to_char
or similar function: https://www.postgresql.org/docs/current/functions-formatting.htmlDescribe alternatives you've considered
Change the parsing / formattting of Durations in arrow-rs
The idea is to make PRs like apache/arrow-rs#4557 that would change the formatting in arrow-rs
Some challenges with this approach:
Special case String casting / parsing
One way to avoid issues here would be to just special case the casting and parsing (
Duration
<==>String
) in datafusion and instead of calling the arrowcast
function we could calldatafusion_cast
which would have special handling for some type and then fall back to the arrowcast
kernelThe challenges with this approach are:
datafusion_cast
rather thancast
cast
)Additional context
No response
The text was updated successfully, but these errors were encountered: