refactor: move type_coercion to analyzer #5831

jackwener · 2023-04-02T15:42:21Z

Which issue does this PR close?

Closes #5848

Rationale for this change

What changes are included in this PR?

Move type_coercion to analyzer.
fix type coercion for subquery

Are these changes tested?

No need new test

Are there any user-facing changes?

jackwener · 2023-04-05T07:19:16Z

This PR contains a fix for type coercion of subquery. I will polish it in following PR.
it will move cast from expression into subplan. It means that we don't cast expression in eval expression and we do cast before eval expression.
But look like it just a little help for performance.

--- before
tpch q17
cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path ./data --format tbl --query 17 --batch-size 4096
Query 17 iteration 0 took 5233.1 ms and returned 1 rows
Query 17 iteration 1 took 4940.8 ms and returned 1 rows
Query 17 iteration 2 took 5160.2 ms and returned 1 rows
Query 17 iteration 3 took 5315.6 ms and returned 1 rows
Query 17 iteration 4 took 4967.7 ms and returned 1 rows
Query 17 avg time: 5123.48 ms

--- after
tpch q17
Query 17 iteration 0 took 4789.5 ms and returned 1 rows
Query 17 iteration 1 took 4785.2 ms and returned 1 rows
Query 17 iteration 2 took 4791.5 ms and returned 1 rows
Query 17 iteration 3 took 5051.4 ms and returned 1 rows
Query 17 iteration 4 took 4817.7 ms and returned 1 rows
Query 17 avg time: 4847.07 ms

jackwener · 2023-04-05T07:21:02Z

slt run fail, but it's strange. cc @alamb @melgenek

[dates.slt] Running query: "select i_item_desc
from test
where d3_date > d2_date + INTERVAL '1 days';"
Error: statement is expected to fail with error:
[dates.slt] Running query: "select i_item_desc
	DataFusion error: Error during planning: Timestamp(Nanosecond, Some("+00:00")) + Utf8 can't be evaluated because there isn't a common type to coerce the types to
from test
but got error:
where d3_date > d2_date + INTERVAL '5 days';"
	DataFusion error: Error during planning: Timestamp(Nanosecond, Some("+00:00")) + Utf8 can't be evaluated because there isn't a common type to coerce the types to

jackwener · 2023-04-05T07:22:35Z

datafusion/core/tests/sql/timestamp.rs

            res = Some(array_value_to_string(batch.column(1), row)?);
            break;
        }
    }
-    assert_eq!(res, Some("Projection: CAST(Utf8(\"2000-01-01\") AS Timestamp(Nanosecond, None)) >= CAST(CAST(Utf8(\"2000-01-01\") AS Date32) AS Timestamp(Nanosecond, None))\n  EmptyRelation".to_string()));
+    assert_eq!(res, Some("Projection: CAST(Utf8(\"2000-01-01\") AS Timestamp(Nanosecond, None)) >= CAST(Utf8(\"2000-01-01\") AS Date32)\n  EmptyRelation".to_string()));


After move type_coercion into analyzer, it don't run multiple time, so we can avoid useless cast.

jackwener · 2023-04-05T07:24:19Z

PR is ready for reivew, cc @mingmwang @alamb @liukun4515

datafusion/core/tests/sqllogictests/test_files/dates.slt

Co-authored-by: Yevhenii Melnyk <[email protected]>

alamb

Looks like a great job to me -- thank you so much @jackwener

alamb · 2023-04-05T10:43:15Z

datafusion/expr/src/expr_schema.rs

-            Ok(self)
-        } else if can_cast_types(&this_type, cast_to_type) {
+
+        // TODO(jackwener): Handle subqueries separately, need to refactor it.


Is this worth tracking with a ticket? I am not sure if this comment mean you want to improve the code or if there is some bug / limitiation

add a ticket #5877.

I am not sure if this comment mean you want to improve the code or if there is some bug / limitiation

It's just for improving code😂.

I don't think there is any need for a ticket to track improving the code 👍 in the future . Thank you for filing one anyway!

alamb · 2023-04-05T10:43:55Z

datafusion/optimizer/src/analyzer/mod.rs


 use crate::analyzer::count_wildcard_rule::CountWildcardRule;
 use crate::analyzer::inline_table_scan::InlineTableScan;

+use crate::analyzer::type_coercion::TypeCoercion;


we can also close this issue #3582
cc @alamb @jackwener

FYI @mingmwang I think you have discussed this as well in the past. It is great to see it has finally been done -- thanks @jackwener !

mingmwang · 2023-04-06T12:38:18Z

I will take a look tomorrow.

@jackwener One thing need to confirm with you is the Error behavior.
Originally the optimizer rules might return an error, and the logical plan optimizer will skip the failed rule by default and will
not fail the SQL. But I think for Analyzer rules, if the rule return an Error, I think the SQL should failure immediately.
For the type_coercion rule, the rule itself is to ensure the correctness, I think if there is error, the SQL should failure.

       /// When set to true, the logical plan optimizer will produce warning
        /// messages if any optimization rules produce errors and then proceed to the next
        /// rule. When set to false, any rules that produce errors will cause the query to fail
        pub skip_failed_rules: bool, default = true

jackwener · 2023-04-06T13:01:28Z

@jackwener One thing need to confirm with you is the Error behavior.
Originally the optimizer rules might return an error, and the logical plan optimizer will skip the failed rule by default and will
not fail the SQL. But I think for Analyzer rules, if the rule return an Error, I think the SQL should failure immediately.
For the type_coercion rule, the rule itself is to ensure the correctness, I think if there is error, the SQL should failure.

Agree with it, before this PR, I already fix some bug about it.

github-actions bot added core Core DataFusion crate optimizer Optimizer rules labels Apr 2, 2023

jackwener force-pushed the type branch 2 times, most recently from 9ce4e08 to d04a2d5 Compare April 5, 2023 02:47

jackwener marked this pull request as ready for review April 5, 2023 02:47

jackwener requested review from alamb and liukun4515 April 5, 2023 02:47

jackwener added 2 commits April 5, 2023 14:37

refactor: move type_coercion to analyzer

5cf6a68

fix cast subquery

f27f653

jackwener force-pushed the type branch from d04a2d5 to f27f653 Compare April 5, 2023 06:41

github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Apr 5, 2023

jackwener mentioned this pull request Apr 5, 2023

TPCH, Query 18 and 17 very slow #5646

Closed

jackwener commented Apr 5, 2023

View reviewed changes

melgenek reviewed Apr 5, 2023

View reviewed changes

datafusion/core/tests/sqllogictests/test_files/dates.slt Outdated Show resolved Hide resolved

fix sqllogictest

d901514

Co-authored-by: Yevhenii Melnyk <[email protected]>

jackwener mentioned this pull request Apr 5, 2023

chore: update sqllogictest version 0.13.2. #5875

Merged

alamb approved these changes Apr 5, 2023

View reviewed changes

jackwener merged commit 787d000 into apache:main Apr 5, 2023

jackwener deleted the type branch April 5, 2023 11:40

HaoYang670 mentioned this pull request Apr 5, 2023

Move type_coercion to the front of logical optimizer #5235

Closed

andygrove mentioned this pull request Apr 10, 2023

Regression in 22.0.0 with filter push-down #5949

Closed

v0y4g3r mentioned this pull request Apr 11, 2023

Type conversion rule fails after upgrading arrow-datafusion GreptimeTeam/greptimedb#1365

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: move type_coercion to analyzer #5831

refactor: move type_coercion to analyzer #5831

jackwener commented Apr 2, 2023 •

edited

Loading

jackwener commented Apr 5, 2023 •

edited

Loading

jackwener commented Apr 5, 2023

jackwener Apr 5, 2023

jackwener commented Apr 5, 2023

alamb left a comment

alamb Apr 5, 2023

jackwener Apr 5, 2023 •

edited

Loading

alamb Apr 5, 2023 •

edited

Loading

alamb Apr 5, 2023

liukun4515 Apr 5, 2023

liukun4515 Apr 5, 2023

alamb Apr 5, 2023

mingmwang commented Apr 6, 2023 •

edited

Loading

jackwener commented Apr 6, 2023

refactor: move type_coercion to analyzer #5831

refactor: move type_coercion to analyzer #5831

Conversation

jackwener commented Apr 2, 2023 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

jackwener commented Apr 5, 2023 • edited Loading

jackwener commented Apr 5, 2023

jackwener Apr 5, 2023

Choose a reason for hiding this comment

jackwener commented Apr 5, 2023

alamb left a comment

Choose a reason for hiding this comment

alamb Apr 5, 2023

Choose a reason for hiding this comment

jackwener Apr 5, 2023 • edited Loading

Choose a reason for hiding this comment

alamb Apr 5, 2023 • edited Loading

Choose a reason for hiding this comment

alamb Apr 5, 2023

Choose a reason for hiding this comment

liukun4515 Apr 5, 2023

Choose a reason for hiding this comment

liukun4515 Apr 5, 2023

Choose a reason for hiding this comment

alamb Apr 5, 2023

Choose a reason for hiding this comment

mingmwang commented Apr 6, 2023 • edited Loading

jackwener commented Apr 6, 2023

jackwener commented Apr 2, 2023 •

edited

Loading

jackwener commented Apr 5, 2023 •

edited

Loading

jackwener Apr 5, 2023 •

edited

Loading

alamb Apr 5, 2023 •

edited

Loading

mingmwang commented Apr 6, 2023 •

edited

Loading