-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite coerce_plan_expr_for_schema to fix union type coercion #4862
Conversation
# under the License. | ||
|
||
########## | ||
## UNION Tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqllogictest
is very easy to use, I like it 👍.
Thanks @ygf11 -- I hope to review this PR more carefully tomorrow |
I want to review this pr which is about type coercion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this PR carefully both code and tests and it makes sense to me. Thank you @ygf11
I will wait for @liukun4515 's review prior to merging
@@ -525,29 +525,55 @@ pub fn coerce_plan_expr_for_schema( | |||
plan: &LogicalPlan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was somewhat confused about this at first because the name of this function
it turns out it is only called for coercing union schemas: https://github.com/search?q=repo%3Aapache%2Farrow-datafusion%20coerce_plan_expr_for_schema&type=code
👍
datafusion/expr/src/expr_rewriter.rs
Outdated
Expr::Alias(e, alias) => { | ||
let new_expr = e.clone().cast_to(new_type, src_schema)?; | ||
Ok(Expr::Alias(Box::new(new_expr), alias.clone())) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can write this more concisely like:
Expr::Alias(e, alias) => { | |
let new_expr = e.clone().cast_to(new_type, src_schema)?; | |
Ok(Expr::Alias(Box::new(new_expr), alias.clone())) | |
} | |
Expr::Alias(e, alias) => { | |
Ok(e.clone() | |
.cast_to(new_type, src_schema)? | |
.alias(alias)) | |
} |
datafusion/expr/src/expr_rewriter.rs
Outdated
} | ||
|
||
fn coerce_exprs_for_schema( | ||
exprs: &[Expr], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exprs: &[Expr], | |
exprs: Vec<Expr>, |
Since the caller owns the Exprs already, I think you can avoid the clones in this function by passing in the Vec directly
" Union [name:UInt8;N]", | ||
" LeftAnti Join: t1.name = t2.name [name:UInt8;N]", | ||
" Distinct: [name:UInt8;N]", | ||
" TableScan: t1 projection=[name] [name:UInt8;N]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This plan is changed after merging master, cause #4849 enable the projection push down
of DISTINCT
.
Before merge:
" Distinct: [name:UInt8;N]",
" Projection: t1.name [name:UInt8;N]",
" TableScan: t1 projection=[name] [name:UInt8;N]",
After merge:
" Distinct: [name:UInt8;N]",
" TableScan: t1 projection=[name] [name:UInt8;N]",
" TableScan: t2 projection=[id, name] [id:UInt8;N, name:UInt8;N]", | ||
" Projection: t1.id, t1.name [id:Int32;N, name:UInt8;N]", | ||
" TableScan: t1 projection=[id, name] [id:Int32;N, name:UInt8;N]", | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the above test.
Thanks again @ygf11 |
Benchmark runs are scheduled for baseline = 3b86643 and contender = eb19a67. eb19a67 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #4837.
Rationale for this change
The original
coerce_plan_expr_for_schema
method use theinput.expressions()
as source of type coercion, which is not correct, because it maybe return other expressions, like join will return its predicates.What changes are included in this PR?
The source of union type coercion is from input schema or projection exprs now.
Are these changes tested?
Yes.
Are there any user-facing changes?