Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: move type_coercion to analyzer #5831

Merged
merged 3 commits into from
Apr 5, 2023
Merged

Conversation

jackwener
Copy link
Member

@jackwener jackwener commented Apr 2, 2023

Which issue does this PR close?

Closes #5848

Rationale for this change

What changes are included in this PR?

  • Move type_coercion to analyzer.
  • fix type coercion for subquery

Are these changes tested?

No need new test

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate optimizer Optimizer rules labels Apr 2, 2023
@jackwener jackwener force-pushed the type branch 2 times, most recently from 9ce4e08 to d04a2d5 Compare April 5, 2023 02:47
@jackwener jackwener marked this pull request as ready for review April 5, 2023 02:47
@jackwener jackwener requested review from alamb and liukun4515 April 5, 2023 02:47
@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Apr 5, 2023
@jackwener
Copy link
Member Author

jackwener commented Apr 5, 2023

This PR contains a fix for type coercion of subquery. I will polish it in following PR.
it will move cast from expression into subplan. It means that we don't cast expression in eval expression and we do cast before eval expression.
But look like it just a little help for performance.

--- before
tpch q17
cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path ./data --format tbl --query 17 --batch-size 4096
Query 17 iteration 0 took 5233.1 ms and returned 1 rows
Query 17 iteration 1 took 4940.8 ms and returned 1 rows
Query 17 iteration 2 took 5160.2 ms and returned 1 rows
Query 17 iteration 3 took 5315.6 ms and returned 1 rows
Query 17 iteration 4 took 4967.7 ms and returned 1 rows
Query 17 avg time: 5123.48 ms
--- after
tpch q17
Query 17 iteration 0 took 4789.5 ms and returned 1 rows
Query 17 iteration 1 took 4785.2 ms and returned 1 rows
Query 17 iteration 2 took 4791.5 ms and returned 1 rows
Query 17 iteration 3 took 5051.4 ms and returned 1 rows
Query 17 iteration 4 took 4817.7 ms and returned 1 rows
Query 17 avg time: 4847.07 ms

@jackwener
Copy link
Member Author

slt run fail, but it's strange. cc @alamb @melgenek

[dates.slt] Running query: "select i_item_desc
from test
where d3_date > d2_date + INTERVAL '1 days';"
Error: statement is expected to fail with error:
[dates.slt] Running query: "select i_item_desc
	DataFusion error: Error during planning: Timestamp(Nanosecond, Some("+00:00")) + Utf8 can't be evaluated because there isn't a common type to coerce the types to
from test
but got error:
where d3_date > d2_date + INTERVAL '5 days';"
	DataFusion error: Error during planning: Timestamp(Nanosecond, Some("+00:00")) + Utf8 can't be evaluated because there isn't a common type to coerce the types to

res = Some(array_value_to_string(batch.column(1), row)?);
break;
}
}
assert_eq!(res, Some("Projection: CAST(Utf8(\"2000-01-01\") AS Timestamp(Nanosecond, None)) >= CAST(CAST(Utf8(\"2000-01-01\") AS Date32) AS Timestamp(Nanosecond, None))\n EmptyRelation".to_string()));
assert_eq!(res, Some("Projection: CAST(Utf8(\"2000-01-01\") AS Timestamp(Nanosecond, None)) >= CAST(Utf8(\"2000-01-01\") AS Date32)\n EmptyRelation".to_string()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After move type_coercion into analyzer, it don't run multiple time, so we can avoid useless cast.

@jackwener
Copy link
Member Author

PR is ready for reivew, cc @mingmwang @alamb @liukun4515

Co-authored-by: Yevhenii Melnyk <[email protected]>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great job to me -- thank you so much @jackwener

Ok(self)
} else if can_cast_types(&this_type, cast_to_type) {

// TODO(jackwener): Handle subqueries separately, need to refactor it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this worth tracking with a ticket? I am not sure if this comment mean you want to improve the code or if there is some bug / limitiation

Copy link
Member Author

@jackwener jackwener Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a ticket #5877.

I am not sure if this comment mean you want to improve the code or if there is some bug / limitiation

It's just for improving code😂.

Copy link
Contributor

@alamb alamb Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is any need for a ticket to track improving the code 👍 in the future . Thank you for filing one anyway!


use crate::analyzer::count_wildcard_rule::CountWildcardRule;
use crate::analyzer::inline_table_scan::InlineTableScan;

use crate::analyzer::type_coercion::TypeCoercion;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also close this issue #3582
cc @alamb @jackwener

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @mingmwang I think you have discussed this as well in the past. It is great to see it has finally been done -- thanks @jackwener !

@jackwener jackwener merged commit 787d000 into apache:main Apr 5, 2023
@jackwener jackwener deleted the type branch April 5, 2023 11:40
@mingmwang
Copy link
Contributor

mingmwang commented Apr 6, 2023

I will take a look tomorrow.

@jackwener One thing need to confirm with you is the Error behavior.
Originally the optimizer rules might return an error, and the logical plan optimizer will skip the failed rule by default and will
not fail the SQL. But I think for Analyzer rules, if the rule return an Error, I think the SQL should failure immediately.
For the type_coercion rule, the rule itself is to ensure the correctness, I think if there is error, the SQL should failure.

       /// When set to true, the logical plan optimizer will produce warning
        /// messages if any optimization rules produce errors and then proceed to the next
        /// rule. When set to false, any rules that produce errors will cause the query to fail
        pub skip_failed_rules: bool, default = true

@jackwener
Copy link
Member Author

@jackwener One thing need to confirm with you is the Error behavior.
Originally the optimizer rules might return an error, and the logical plan optimizer will skip the failed rule by default and will
not fail the SQL. But I think for Analyzer rules, if the rule return an Error, I think the SQL should failure immediately.
For the type_coercion rule, the rule itself is to ensure the correctness, I think if there is error, the SQL should failure.

Agree with it, before this PR, I already fix some bug about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move type_coercion into analyzer
5 participants