-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: move type_coercion to analyzer #5831
Conversation
9ce4e08
to
d04a2d5
Compare
This PR contains a fix for type coercion of subquery. I will polish it in following PR. --- before
tpch q17
cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path ./data --format tbl --query 17 --batch-size 4096
Query 17 iteration 0 took 5233.1 ms and returned 1 rows
Query 17 iteration 1 took 4940.8 ms and returned 1 rows
Query 17 iteration 2 took 5160.2 ms and returned 1 rows
Query 17 iteration 3 took 5315.6 ms and returned 1 rows
Query 17 iteration 4 took 4967.7 ms and returned 1 rows
Query 17 avg time: 5123.48 ms --- after
tpch q17
Query 17 iteration 0 took 4789.5 ms and returned 1 rows
Query 17 iteration 1 took 4785.2 ms and returned 1 rows
Query 17 iteration 2 took 4791.5 ms and returned 1 rows
Query 17 iteration 3 took 5051.4 ms and returned 1 rows
Query 17 iteration 4 took 4817.7 ms and returned 1 rows
Query 17 avg time: 4847.07 ms |
slt run fail, but it's strange. cc @alamb @melgenek [dates.slt] Running query: "select i_item_desc
from test
where d3_date > d2_date + INTERVAL '1 days';"
Error: statement is expected to fail with error:
[dates.slt] Running query: "select i_item_desc
DataFusion error: Error during planning: Timestamp(Nanosecond, Some("+00:00")) + Utf8 can't be evaluated because there isn't a common type to coerce the types to
from test
but got error:
where d3_date > d2_date + INTERVAL '5 days';"
DataFusion error: Error during planning: Timestamp(Nanosecond, Some("+00:00")) + Utf8 can't be evaluated because there isn't a common type to coerce the types to |
res = Some(array_value_to_string(batch.column(1), row)?); | ||
break; | ||
} | ||
} | ||
assert_eq!(res, Some("Projection: CAST(Utf8(\"2000-01-01\") AS Timestamp(Nanosecond, None)) >= CAST(CAST(Utf8(\"2000-01-01\") AS Date32) AS Timestamp(Nanosecond, None))\n EmptyRelation".to_string())); | ||
assert_eq!(res, Some("Projection: CAST(Utf8(\"2000-01-01\") AS Timestamp(Nanosecond, None)) >= CAST(Utf8(\"2000-01-01\") AS Date32)\n EmptyRelation".to_string())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After move type_coercion into analyzer, it don't run multiple time, so we can avoid useless cast.
PR is ready for reivew, cc @mingmwang @alamb @liukun4515 |
Co-authored-by: Yevhenii Melnyk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a great job to me -- thank you so much @jackwener
Ok(self) | ||
} else if can_cast_types(&this_type, cast_to_type) { | ||
|
||
// TODO(jackwener): Handle subqueries separately, need to refactor it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this worth tracking with a ticket? I am not sure if this comment mean you want to improve the code or if there is some bug / limitiation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a ticket #5877.
I am not sure if this comment mean you want to improve the code or if there is some bug / limitiation
It's just for improving code😂.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is any need for a ticket to track improving the code 👍 in the future . Thank you for filing one anyway!
|
||
use crate::analyzer::count_wildcard_rule::CountWildcardRule; | ||
use crate::analyzer::inline_table_scan::InlineTableScan; | ||
|
||
use crate::analyzer::type_coercion::TypeCoercion; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also close this issue #3582
cc @alamb @jackwener
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @mingmwang I think you have discussed this as well in the past. It is great to see it has finally been done -- thanks @jackwener !
I will take a look tomorrow. @jackwener One thing need to confirm with you is the /// When set to true, the logical plan optimizer will produce warning
/// messages if any optimization rules produce errors and then proceed to the next
/// rule. When set to false, any rules that produce errors will cause the query to fail
pub skip_failed_rules: bool, default = true |
Agree with it, before this PR, I already fix some bug about it. |
Which issue does this PR close?
Closes #5848
Rationale for this change
What changes are included in this PR?
Are these changes tested?
No need new test
Are there any user-facing changes?