Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add extend expression message for expression only evaluation #405
feat: add extend expression message for expression only evaluation #405
Changes from all commits
436ff7a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's strange to me that
output_names
are inExpressionReference
and notExtendedExpression
but this is a natural consequence of have repeated expressions at the top level.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have thought about this two options. Both of them can work as expected but seems it makes things complicated if we put it in
ExtendedExpression
. Think about 2 expressions that one has a single output column calledoutput_a
and the other has 2 output columns calledoutput_b, output_c
. If we put it inExtendedExpression
, then the names should likeoutput_a, output_b, output_c
, then we have to do a mapping job when try to connect the column to its name. Do you think so?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am confused. How can an expression have more than one output column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking about a function like
SELECT CAST(JSON '{"k1":1,"k2":23,"k3":456}' AS MAP(VARCHAR, INTEGER));
It should have 2 columns output in a columnar computation scenario.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would have multiple output_names, I see now.
Only one of these should be the top-level expression correct? That is the only one whose names we care about.
For example, if we have three expressions:
The only meaningful value of
output_names
is "final" correct?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any special meaning between names "first", "middle" and "final"? I mean is there any implicit relation between these expressions? if not, in case that they equally exist in a project node, which has three output columns for following plan node, all of these names are meaningful and necessary. Following plan node probably refer the columns by these names. Example pattern likes
agg (sum("a"), min("b"), max("c") <- project(expr_0:"a", expr_1:"b", expr_2:"c")
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. Sorry, I was thinking there was only one top-level expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still find it confusing that we have a repeated list of expressions here. I think it would be more intuitive to have:
We could do something similar to what we do with plans:
However, I think I would prefer something more straightforward:
Although, if we end up adopting something like #415 then we could use repeated sub-expressions everywhere and we wouldn't need any of this. It could just be:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your examples. I think the greatest benefit is that a
let
expression is explicitly highlighted so that consumers doesn't need expr analysis again for an optimization like "CSE". This is a good idea but it should be contained in messageExpression
inside? If so,ExtendedExpression
can keep as current.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had
let
expression then we do not need the top level expression to be repeated.However, I agree that the current proposal will still work if we introduce
let
expressions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let
expression will make CSE easier but it also need producer does this expression level optimization and stores that info before translation to substrait. We are happy to do refine on thisExtendedExpression
message if needed when this idea finishes adoption.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prose can be cleaned up in a future PR but it isn't very clear to me. If you ping me with a reminder if this merges then I will be happy to make an attempt at doing so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried finding out a suitable name but seems it doesn't quite work. My user scenario is expression level evaluation offloading but current
Expression
doesn't contain all required infos, like schema, function, etc. Any suggestion from you?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant the entire file. Just minor grammatical things. It's not a big concern.