Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Binary Operator StringConcat to Function for array_concat, array_append and array_prepend #8636

Merged
merged 11 commits into from
Jan 5, 2024

Conversation

jayzhan211
Copy link
Contributor

Which issue does this PR close?

Part of #8506

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner physical-expr Physical Expressions labels Dec 23, 2023

// Convert `Array StringConcat Array` to ScalarFunction::ArrayConcat
let expr = match (op, &left, &right) {
// Chain concat operator (a || b) || array,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really long. :(
Not yet find a nicer way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the rule to convert Operator::StringConcat ==> ArrayConcat depends on the type of the arguments and this code is effectively rewriting the originally created Expr tree.

What do you think about putting this logic in an analyzer rule: https://github.com/apache/arrow-datafusion/tree/main/datafusion/optimizer/src/analyzer?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Dec 23, 2023
@jayzhan211 jayzhan211 changed the title Convert Binary Operator StringConcat to Function array_concat, array_append and array_prepend Convert Binary Operator StringConcat to Function for array_concat, array_append and array_prepend Dec 23, 2023
@jayzhan211 jayzhan211 marked this pull request as ready for review December 23, 2023 03:57
@jayzhan211 jayzhan211 marked this pull request as draft December 28, 2023 00:56
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 28, 2023
@jayzhan211 jayzhan211 marked this pull request as ready for review December 28, 2023 01:29
@alamb
Copy link
Contributor

alamb commented Dec 28, 2023

Thank you for this @jayzhan211 -- I plan to review it carefully tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jayzhan211 -- I found the rationale by @viirya #8496 (comment) makes sense.

Given the idea is to make the logic for function / operator aliasing easier to find / centralized in the codebase, perhaps a better place for this logic could be an analyzer rule (for example OperatorsToFunction) that would be easier to find and easier to document what it was doing


// Convert `Array StringConcat Array` to ScalarFunction::ArrayConcat
let expr = match (op, &left, &right) {
// Chain concat operator (a || b) || array,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the rule to convert Operator::StringConcat ==> ArrayConcat depends on the type of the arguments and this code is effectively rewriting the originally created Expr tree.

What do you think about putting this logic in an analyzer rule: https://github.com/apache/arrow-datafusion/tree/main/datafusion/optimizer/src/analyzer?

@jayzhan211 jayzhan211 marked this pull request as draft January 1, 2024 23:48
@jayzhan211 jayzhan211 force-pushed the list-concat-rewrite branch from 9131096 to 6201176 Compare January 2, 2024 12:53
@github-actions github-actions bot added the optimizer Optimizer rules label Jan 2, 2024
fn analyze_internal(plan: LogicalPlan) -> Result<LogicalPlan> {
// OperatorToFunction is only applied to Projection
match plan {
LogicalPlan::Projection(_) => {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on test, I found they are all project plan with inputs.len() 1. Not sure is this assumption correct or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not catch uses of operators in all places (like LogicalPlan::Filter)

I think you can follow the model of https://github.com/apache/arrow-datafusion/blob/f4233a92761e9144b8747e66b95bf0b3f82464b8/datafusion/optimizer/src/analyzer/type_coercion.rs#L74-L122 and call LogicalPlan::expressions() to get all the expressions in a LogicalPlan node, rewrite them appropriately, and then call LogicaPlan::new_with_exprs to get the rewritten node

 let new_expr = plan
        .expressions()
        .into_iter()
        .map(|expr| {
            // ensure aggregate names don't change:
            // https://github.com/apache/arrow-datafusion/issues/3555
            rewrite_preserving_name(expr, &mut expr_rewrite)
        })
        .collect::<Result<Vec<_>>>()?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did follow the code in type coercion at the first place, but I'm not sure whether we need these for operatorToFunction since I did find any test cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems column wise cases cover it.

.collect::<Result<Vec<_>>>()?;

// Not found cases that inputs more than one
assert_eq!(plan.inputs().len(), 1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 marked this pull request as ready for review January 2, 2024 14:19
@jayzhan211 jayzhan211 requested a review from alamb January 2, 2024 14:19
@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Jan 2, 2024

cargo test (win64) has been run over 1hr ... Since cargo test (mac) pass, I think that may be a bug in CI.

@alamb
Copy link
Contributor

alamb commented Jan 2, 2024

cargo test (win64) has been run over 1hr ... Since cargo test (mac) pass, I think that may be a bug in CI.

I think @Jefffrey noticed the same thing here: #8696

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- this is lookingvery close -- I think the recursion needs a small tweak to cover all the types of LogicalPlans but otherwise 👍

fn analyze_internal(plan: LogicalPlan) -> Result<LogicalPlan> {
// OperatorToFunction is only applied to Projection
match plan {
LogicalPlan::Projection(_) => {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not catch uses of operators in all places (like LogicalPlan::Filter)

I think you can follow the model of https://github.com/apache/arrow-datafusion/blob/f4233a92761e9144b8747e66b95bf0b3f82464b8/datafusion/optimizer/src/analyzer/type_coercion.rs#L74-L122 and call LogicalPlan::expressions() to get all the expressions in a LogicalPlan node, rewrite them appropriately, and then call LogicaPlan::new_with_exprs to get the rewritten node

 let new_expr = plan
        .expressions()
        .into_iter()
        .map(|expr| {
            // ensure aggregate names don't change:
            // https://github.com/apache/arrow-datafusion/issues/3555
            rewrite_preserving_name(expr, &mut expr_rewrite)
        })
        .collect::<Result<Vec<_>>>()?;

datafusion/optimizer/src/analyzer/rewrite_expr.rs Outdated Show resolved Hide resolved
Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 marked this pull request as draft January 3, 2024 01:11
jayzhan211 and others added 5 commits January 3, 2024 09:30
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211

This comment was marked as outdated.

@jayzhan211 jayzhan211 marked this pull request as ready for review January 3, 2024 12:56
@jayzhan211 jayzhan211 requested a review from alamb January 3, 2024 12:56
}
}

fn analyze_internal(plan: &LogicalPlan) -> Result<LogicalPlan> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

external_schema is not used here.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you for sticking with it @jayzhan211

@alamb
Copy link
Contributor

alamb commented Jan 4, 2024

cc @Veeupup as I think you were working on something similar

@alamb alamb merged commit 4e4059a into apache:main Jan 5, 2024
22 checks passed
@alamb
Copy link
Contributor

alamb commented Jan 5, 2024

Thanks again @jayzhan211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants