change pre_cast_lit_in_comparison to unwrap_cast_in_comparison #3662

liukun4515 · 2022-09-30T07:11:06Z

Which issue does this PR close?

In order to achieve the target of #3582, we need do the type coercion first before all optimizer rules.

After done the type coercion, some of the expr will be wrapped by the cast/try_cast, the rule of pre_cast_lit_in_comparison will not work, so we should refactor it.

For example:

For this case cast(left_expr as data_type) < lit(value as data_type), if the value can be converted to the type of left_expr, we can unwrap the cast for the expr cast(left_expr as data_type) and add cast to lit(value as data_type) and get the new literal expr cast(lit(value as data_type) as left_expr_data_type).

Closes #3622

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

liukun4515 · 2022-09-30T07:16:08Z

datafusion/core/tests/sql/explain_analyze.rs

-        vec![
-            "logical_plan",
-            "Projection: #aggregate_test_100.c1\
-             \n  Filter: CAST(#aggregate_test_100.c2 AS Int32) > Int32(10)\


CAST(#aggregate_test_100.c2 AS Int32) will be converted to #aggregate_test_100.c2, and the Int32(10) will be converted to Int8(10)

alamb

I went through this PR carefully. Nice work @liukun4515

alamb · 2022-09-30T11:07:12Z

datafusion/core/src/execution/context.rs

            Arc::new(TypeCoercion::new()),
            Arc::new(SimplifyExpressions::new()),
+            Arc::new(UnwrapCastInComparison::new()),


this name is much easier to understand for me -- thank you

alamb · 2022-09-30T11:08:48Z

datafusion/core/tests/sql/explain_analyze.rs

-            "logical_plan",
-            "Projection: #aggregate_test_100.c1\
-             \n  Filter: CAST(#aggregate_test_100.c2 AS Int32) > Int32(10)\
-             \n    TableScan: aggregate_test_100 projection=[c1, c2], partial_filters=[CAST(#aggregate_test_100.c2 AS Int32) > Int32(10)]"


The key difference for anyone else following along is that

partial_filters=[CAST(#aggregate_test_100.c2 AS Int32) > Int32(10)]

Has become

partial_filters=[partial_filters=[#aggregate_test_100.c2 > Int8(10)]

yes, we remove the cast for CAST(#aggregate_test_100.c2 AS Int32), and add cast for Int32(10)

datafusion/optimizer/src/unwrap_cast_in_comparison.rs

alamb · 2022-09-30T11:21:19Z

datafusion/optimizer/src/unwrap_cast_in_comparison.rs

            vec![
-                lit(ScalarValue::Int32(Some(12))),
+                lit(ScalarValue::Int64(Some(12))),


Is this change (so that all the IN list types are the same) because type coercion will have already been done when this code is now called?

Yes.
In this rule, we need to assume that the type coercion has been done for the expr or the plan.

liukun4515 · 2022-09-30T14:35:43Z

The ci failed, but we can merge it first.
cc @alamb

ursabot · 2022-09-30T14:42:18Z

Benchmark runs are scheduled for baseline = f862eef and contender = a1b2112. a1b2112 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

andygrove · 2022-10-03T16:31:10Z

@alamb @liukun4515 This PR introduced regressions that I have documented in #3690. I don't know if it is better to revert this PR or try and fix.

alamb · 2022-10-03T17:53:20Z

@alamb @liukun4515 This PR introduced regressions that I have documented in #3690. I don't know if it is better to revert this PR or try and fix.

I would rather fix it but defer to you. I think #3676 fixes part of it. I will merge that and fix the other part tomorrow if @liukun4515 hasn't had a chance to do so

liukun4515 · 2022-10-04T07:35:55Z

@alamb @liukun4515 This PR introduced regressions that I have documented in #3690. I don't know if it is better to revert this PR or try and fix.

I would rather fix it but defer to you. I think #3676 fixes part of it. I will merge that and fix the other part tomorrow if @liukun4515 hasn't had a chance to do so

@alamb @andygrove
Sorry for the late reply, I will check the regression issue.

github-actions bot added core Core DataFusion crate optimizer Optimizer rules labels Sep 30, 2022

liukun4515 requested review from alamb and andygrove September 30, 2022 07:11

liukun4515 commented Sep 30, 2022

View reviewed changes

change pre_cast_lit_in_comparison to unwrap_cast_in_comparison

5c6c334

liukun4515 force-pushed the issue_#3622 branch from b194fa7 to 5c6c334 Compare September 30, 2022 07:23

alamb approved these changes Sep 30, 2022

View reviewed changes

liukun4515 force-pushed the issue_#3622 branch from fd65504 to 526e7f4 Compare September 30, 2022 12:27

change some test case

0a2e478

liukun4515 force-pushed the issue_#3622 branch from 526e7f4 to 0a2e478 Compare September 30, 2022 13:39

liukun4515 merged commit a1b2112 into apache:master Sep 30, 2022

andygrove mentioned this pull request Oct 3, 2022

Optimizer regressions in unwrap_cast_in_comparison #3690

Closed

andygrove mentioned this pull request Oct 3, 2022

Fix optimizer regressions #3694

Closed

andygrove added a commit to andygrove/datafusion that referenced this pull request Oct 3, 2022

revert apache#3662

49ba27c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change pre_cast_lit_in_comparison to unwrap_cast_in_comparison #3662

change pre_cast_lit_in_comparison to unwrap_cast_in_comparison #3662

liukun4515 commented Sep 30, 2022 •

edited

Loading

liukun4515 Sep 30, 2022

alamb left a comment

alamb Sep 30, 2022

alamb Sep 30, 2022

liukun4515 Sep 30, 2022

alamb Sep 30, 2022

liukun4515 Sep 30, 2022

liukun4515 commented Sep 30, 2022

ursabot commented Sep 30, 2022

andygrove commented Oct 3, 2022

alamb commented Oct 3, 2022

liukun4515 commented Oct 4, 2022

change pre_cast_lit_in_comparison to unwrap_cast_in_comparison #3662

change pre_cast_lit_in_comparison to unwrap_cast_in_comparison #3662

Conversation

liukun4515 commented Sep 30, 2022 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

liukun4515 Sep 30, 2022

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb Sep 30, 2022

Choose a reason for hiding this comment

alamb Sep 30, 2022

Choose a reason for hiding this comment

liukun4515 Sep 30, 2022

Choose a reason for hiding this comment

alamb Sep 30, 2022

Choose a reason for hiding this comment

liukun4515 Sep 30, 2022

Choose a reason for hiding this comment

liukun4515 commented Sep 30, 2022

ursabot commented Sep 30, 2022

andygrove commented Oct 3, 2022

alamb commented Oct 3, 2022

liukun4515 commented Oct 4, 2022

liukun4515 commented Sep 30, 2022 •

edited

Loading