Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert list operator to function in sql_expr_to_logical_expr #8506

Open
jayzhan211 opened this issue Dec 12, 2023 · 2 comments
Open

Convert list operator to function in sql_expr_to_logical_expr #8506

jayzhan211 opened this issue Dec 12, 2023 · 2 comments

Comments

@jayzhan211
Copy link
Contributor

jayzhan211 commented Dec 12, 2023

Like what we have done for [] -> MakeArray.
It is possible to have the similar conversion for other List Operator in the early stage (Sql to LogicExpr).

TODO

Other: clean up list/array coercion in datafusion/expr/src/type_coercion/binary.rs

Originally posted by @viirya in #8496 (comment)

@alamb
Copy link
Contributor

alamb commented Dec 15, 2023

This makes sense to me

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Jan 7, 2024

I think there is a little issue with the current design.
We rewrite the | | operator to function after the logical plan is built. Before OperatorToFunction is applied, we need to build logical plan, which calls projection_schema. It calls exprlist_to_fields to build schema. Inside to_field, we calculate the return type of string concat, which does type coercion. The problem is that the return type calculated here is useless, because we rewrite it to function afterward.

The type coercion for string concat is done inside string_coercion, btw, it is incorrect.
https://github.com/apache/arrow-datafusion/blob/dd4263f843e093c807d63edf73a571b1ba2669b5/datafusion/expr/src/type_coercion/binary.rs#L704C9-L708

select [1,2,3] || 4.1 return List(Int64), but it should be List(F64) instead. However, since it is rewritten to Function, so the output is still correct at the end.

If we need to avoid the calculation (mainly type coercion) here, we may need an if else branch to skip the schema building for the operators that will be rewritten afterward. For example, return empty schema inside projection_schema, if the expr is going to be rewritten later. (Currently only List operator)

pub struct BinaryExpr {
    /// Left-hand side of the expression
    pub left: Box<Expr>,
    /// The comparison operator
    pub op: Operator,
    /// Right-hand side of the expression
    pub right: Box<Expr>,

    /// will be rewritten (maybe better name)
    pub will_be_rewritten: bool
}

We can then do different things like return empty schema while building logical plan based on the will_be_rewritten.
p.s. since Expr is Enum not trait, so we can only have will_be_rewritten for each possible Expr.

@alamb @viirya how do you think?

Upd:

I have tested that return empty schema for project_schema and remove the constraint below works.

        if expr.len() != schema.fields().len() {
            return plan_err!("Projection has mismatch between number of expressions ({}) and number of fields in schema ({})", expr.len(), schema.fields().len());
        }

@jayzhan211 jayzhan211 changed the title Convert list operator to function in sql to LogicExpr stage Convert list operator to function in sql_expr_to_logical_expr Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants