Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change pre_cast_lit_in_comparison to unwrap_cast_in_comparison #3662

Merged
merged 2 commits into from
Sep 30, 2022

Conversation

liukun4515
Copy link
Contributor

@liukun4515 liukun4515 commented Sep 30, 2022

Which issue does this PR close?

In order to achieve the target of #3582, we need do the type coercion first before all optimizer rules.

After done the type coercion, some of the expr will be wrapped by the cast/try_cast, the rule of pre_cast_lit_in_comparison will not work, so we should refactor it.

For example:

For this case cast(left_expr as data_type) < lit(value as data_type), if the value can be converted to the type of left_expr, we can unwrap the cast for the expr cast(left_expr as data_type) and add cast to lit(value as data_type) and get the new literal expr cast(lit(value as data_type) as left_expr_data_type).

Closes #3622

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate optimizer Optimizer rules labels Sep 30, 2022
vec![
"logical_plan",
"Projection: #aggregate_test_100.c1\
\n Filter: CAST(#aggregate_test_100.c2 AS Int32) > Int32(10)\
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CAST(#aggregate_test_100.c2 AS Int32) will be converted to #aggregate_test_100.c2, and the Int32(10) will be converted to Int8(10)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this PR carefully. Nice work @liukun4515

Arc::new(TypeCoercion::new()),
Arc::new(SimplifyExpressions::new()),
Arc::new(UnwrapCastInComparison::new()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this name is much easier to understand for me -- thank you

"logical_plan",
"Projection: #aggregate_test_100.c1\
\n Filter: CAST(#aggregate_test_100.c2 AS Int32) > Int32(10)\
\n TableScan: aggregate_test_100 projection=[c1, c2], partial_filters=[CAST(#aggregate_test_100.c2 AS Int32) > Int32(10)]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key difference for anyone else following along is that

partial_filters=[CAST(#aggregate_test_100.c2 AS Int32) > Int32(10)]

Has become

partial_filters=[partial_filters=[#aggregate_test_100.c2 > Int8(10)]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we remove the cast for CAST(#aggregate_test_100.c2 AS Int32), and add cast for Int32(10)

vec![
lit(ScalarValue::Int32(Some(12))),
lit(ScalarValue::Int64(Some(12))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change (so that all the IN list types are the same) because type coercion will have already been done when this code is now called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
In this rule, we need to assume that the type coercion has been done for the expr or the plan.

@liukun4515
Copy link
Contributor Author

The ci failed, but we can merge it first.
cc @alamb

@liukun4515 liukun4515 merged commit a1b2112 into apache:master Sep 30, 2022
@ursabot
Copy link

ursabot commented Sep 30, 2022

Benchmark runs are scheduled for baseline = f862eef and contender = a1b2112. a1b2112 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@andygrove
Copy link
Member

@alamb @liukun4515 This PR introduced regressions that I have documented in #3690. I don't know if it is better to revert this PR or try and fix.

andygrove added a commit to andygrove/datafusion that referenced this pull request Oct 3, 2022
@alamb
Copy link
Contributor

alamb commented Oct 3, 2022

@alamb @liukun4515 This PR introduced regressions that I have documented in #3690. I don't know if it is better to revert this PR or try and fix.

I would rather fix it but defer to you. I think #3676 fixes part of it. I will merge that and fix the other part tomorrow if @liukun4515 hasn't had a chance to do so

@liukun4515
Copy link
Contributor Author

@alamb @liukun4515 This PR introduced regressions that I have documented in #3690. I don't know if it is better to revert this PR or try and fix.

I would rather fix it but defer to you. I think #3676 fixes part of it. I will merge that and fix the other part tomorrow if @liukun4515 hasn't had a chance to do so

@alamb @andygrove
Sorry for the late reply, I will check the regression issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

change rule of PreCastLitInComparisonExpressions to unwrap cast rule after #3582
4 participants