-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert Binary Operator StringConcat
to Function for array_concat
, array_append
and array_prepend
#8636
Conversation
datafusion/sql/src/expr/mod.rs
Outdated
|
||
// Convert `Array StringConcat Array` to ScalarFunction::ArrayConcat | ||
let expr = match (op, &left, &right) { | ||
// Chain concat operator (a || b) || array, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really long. :(
Not yet find a nicer way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the rule to convert Operator::StringConcat
==> ArrayConcat depends on the type of the arguments and this code is effectively rewriting the originally created Expr tree.
What do you think about putting this logic in an analyzer rule: https://github.com/apache/arrow-datafusion/tree/main/datafusion/optimizer/src/analyzer?
StringConcat
to Function array_concat
, array_append
and array_prepend
StringConcat
to Function for array_concat
, array_append
and array_prepend
Thank you for this @jayzhan211 -- I plan to review it carefully tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jayzhan211 -- I found the rationale by @viirya #8496 (comment) makes sense.
Given the idea is to make the logic for function / operator aliasing easier to find / centralized in the codebase, perhaps a better place for this logic could be an analyzer rule (for example OperatorsToFunction
) that would be easier to find and easier to document what it was doing
datafusion/sql/src/expr/mod.rs
Outdated
|
||
// Convert `Array StringConcat Array` to ScalarFunction::ArrayConcat | ||
let expr = match (op, &left, &right) { | ||
// Chain concat operator (a || b) || array, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the rule to convert Operator::StringConcat
==> ArrayConcat depends on the type of the arguments and this code is effectively rewriting the originally created Expr tree.
What do you think about putting this logic in an analyzer rule: https://github.com/apache/arrow-datafusion/tree/main/datafusion/optimizer/src/analyzer?
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
9131096
to
6201176
Compare
fn analyze_internal(plan: LogicalPlan) -> Result<LogicalPlan> { | ||
// OperatorToFunction is only applied to Projection | ||
match plan { | ||
LogicalPlan::Projection(_) => {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on test, I found they are all project plan with inputs.len() 1. Not sure is this assumption correct or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not catch uses of operators in all places (like LogicalPlan::Filter
)
I think you can follow the model of https://github.com/apache/arrow-datafusion/blob/f4233a92761e9144b8747e66b95bf0b3f82464b8/datafusion/optimizer/src/analyzer/type_coercion.rs#L74-L122 and call LogicalPlan::expressions()
to get all the expressions in a LogicalPlan
node, rewrite them appropriately, and then call LogicaPlan::new_with_exprs
to get the rewritten node
let new_expr = plan
.expressions()
.into_iter()
.map(|expr| {
// ensure aggregate names don't change:
// https://github.com/apache/arrow-datafusion/issues/3555
rewrite_preserving_name(expr, &mut expr_rewrite)
})
.collect::<Result<Vec<_>>>()?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did follow the code in type coercion at the first place, but I'm not sure whether we need these for operatorToFunction since I did find any test cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems column wise cases cover it.
.collect::<Result<Vec<_>>>()?; | ||
|
||
// Not found cases that inputs more than one | ||
assert_eq!(plan.inputs().len(), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
Signed-off-by: jayzhan211 <[email protected]>
cargo test (win64) has been run over 1hr ... Since cargo test (mac) pass, I think that may be a bug in CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 -- this is lookingvery close -- I think the recursion needs a small tweak to cover all the types of LogicalPlan
s but otherwise 👍
fn analyze_internal(plan: LogicalPlan) -> Result<LogicalPlan> { | ||
// OperatorToFunction is only applied to Projection | ||
match plan { | ||
LogicalPlan::Projection(_) => {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not catch uses of operators in all places (like LogicalPlan::Filter
)
I think you can follow the model of https://github.com/apache/arrow-datafusion/blob/f4233a92761e9144b8747e66b95bf0b3f82464b8/datafusion/optimizer/src/analyzer/type_coercion.rs#L74-L122 and call LogicalPlan::expressions()
to get all the expressions in a LogicalPlan
node, rewrite them appropriately, and then call LogicaPlan::new_with_exprs
to get the rewritten node
let new_expr = plan
.expressions()
.into_iter()
.map(|expr| {
// ensure aggregate names don't change:
// https://github.com/apache/arrow-datafusion/issues/3555
rewrite_preserving_name(expr, &mut expr_rewrite)
})
.collect::<Result<Vec<_>>>()?;
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
} | ||
} | ||
|
||
fn analyze_internal(plan: &LogicalPlan) -> Result<LogicalPlan> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
external_schema
is not used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you for sticking with it @jayzhan211
cc @Veeupup as I think you were working on something similar |
Thanks again @jayzhan211 |
Which issue does this PR close?
Part of #8506
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?