Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: common_subexpr_eliminate rule should not apply to short-circuit expression #8928

Merged
merged 7 commits into from
Jan 22, 2024

Conversation

haohuaijin
Copy link
Contributor

@haohuaijin haohuaijin commented Jan 21, 2024

Which issue does this PR close?

Closes #8874

Rationale for this change

see #8874

What changes are included in this PR?

before this pr

❯ explain select coalesce(1, y/x), coalesce(2, y/x) from t;
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                  |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: coalesce(Int64(1), CAST(t.y / t.x AS Int64)t.y / t.xt.xt.y AS t.y / t.x), coalesce(Int64(2), CAST(t.y / t.x AS Int64)t.y / t.xt.xt.y AS t.y / t.x)                                        |
|               |   Projection: CAST(t.y / t.x AS Int64) AS CAST(t.y / t.x AS Int64)t.y / t.xt.xt.y                                                                                                                     |
|               |     TableScan: t projection=[x, y]                                                                                                                                                                    |
| physical_plan | ProjectionExec: expr=[coalesce(1, CAST(t.y / t.x AS Int64)t.y / t.xt.xt.y@0) as coalesce(Int64(1),t.y / t.x), coalesce(2, CAST(t.y / t.x AS Int64)t.y / t.xt.xt.y@0) as coalesce(Int64(2),t.y / t.x)] |
|               |   ProjectionExec: expr=[CAST(y@1 / x@0 AS Int64) as CAST(t.y / t.x AS Int64)t.y / t.xt.xt.y]                                                                                                          |
|               |     MemoryExec: partitions=1, partition_sizes=[1]                                                                                                                                                     |
|               |                                                                                                                                                                                                       |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.014 seconds.

after this pr

❯ explain select coalesce(1, y/x), coalesce(2, y/x) from t;
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: coalesce(Int64(1), CAST(t.y / t.x AS Int64)), coalesce(Int64(2), CAST(t.y / t.x AS Int64))                                                              |
|               |   TableScan: t projection=[x, y]                                                                                                                                    |
| physical_plan | ProjectionExec: expr=[coalesce(1, CAST(y@1 / x@0 AS Int64)) as coalesce(Int64(1),t.y / t.x), coalesce(2, CAST(y@1 / x@0 AS Int64)) as coalesce(Int64(2),t.y / t.x)] |
|               |   MemoryExec: partitions=1, partition_sizes=[1]                                                                                                                     |
|               |                                                                                                                                                                     |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.008 seconds.

But due to the reason describe in #8927, the above two query will get same result(get drive by zero error).

Are these changes tested?

yes, add some negative test

Are there any user-facing changes?

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Jan 21, 2024
@@ -1129,5 +1129,35 @@ FROM t AS A, (SELECT * FROM t WHERE x = 0) AS B;
0 0
0 0

query TT
explain select coalesce(1, y/x), coalesce(2, y/x) from t;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because of the reason describe in #8927, use plan to test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add some comments here explaining the rationale and what the expected outputs are (so future readers know if changes are expected)

Something like this perhaps:

# Expressions that short circuit should not be refactored out as that may cause side effects (divide by zero)
# at plan time that would not actually happen during execution

Also, can you please add the tests that now pass (e.g. select coalesce(1, y/x), coalesce(2, y/x) from t;) so if someone breaks this code by accident, those queries would start failing, which might be easier to quickly tell is incorrect

@haohuaijin
Copy link
Contributor Author

cc @jackwener @liukun4515 @alamb

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @haohuaijin -- the code looks good to me 🙏

I also took a look at https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.BuiltinScalarFunction.html# and https://docs.rs/datafusion/latest/datafusion/logical_expr/expr/enum.Expr.html and didn't see any additional functions that shoudl be marked as short circuiting.

I had a code structure suggestion that I don't think is needed (though I think it would improve this PR).

I think a few more tests that actually run (and don't error) should be added prior to merge.

Thanks again!

datafusion/optimizer/src/common_subexpr_eliminate.rs Outdated Show resolved Hide resolved
@@ -1129,5 +1129,35 @@ FROM t AS A, (SELECT * FROM t WHERE x = 0) AS B;
0 0
0 0

query TT
explain select coalesce(1, y/x), coalesce(2, y/x) from t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add some comments here explaining the rationale and what the expected outputs are (so future readers know if changes are expected)

Something like this perhaps:

# Expressions that short circuit should not be refactored out as that may cause side effects (divide by zero)
# at plan time that would not actually happen during execution

Also, can you please add the tests that now pass (e.g. select coalesce(1, y/x), coalesce(2, y/x) from t;) so if someone breaks this code by accident, those queries would start failing, which might be easier to quickly tell is incorrect

Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @haohuaijin @alamb .

@alamb 's review suggestions are already very detailed, and I agree with his opinion. After those impromentent, we can merge this PR

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jan 22, 2024
@haohuaijin
Copy link
Contributor Author

haohuaijin commented Jan 22, 2024

Thanks @alamb and @jackwener's review and great suggestions.

I apply those suggestions, but I'm unsure if the test I added is appropriate(560cb6c).

I also add describe in #8928 (comment), to show what the change in this pr.

ProjectionExec: expr=[y@1 = 0 OR 1 / CAST(y@1 AS Int64) < 1 as t.y = Int64(0) OR Int64(1) / t.y < Int64(1), x@0 = 0 OR y@1 = 0 OR 1 / CAST(y@1 AS Int64) < 1 / CAST(x@0 AS Int64) as t.x = Int64(0) OR t.y = Int64(0) OR Int64(1) / t.y < Int64(1) / t.x]
--MemoryExec: partitions=1, partition_sizes=[1]

# due to the reason describe in https://github.com/apache/arrow-datafusion/issues/8927,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if those tests are appropriate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's related with #8910

@haohuaijin haohuaijin requested review from alamb and jackwener January 22, 2024 08:17
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @haohuaijin

matches!(op, Operator::And | Operator::Or)
}
Expr::Case { .. } => true,
// Use explicit pattern match instead of a default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented Jan 22, 2024

Thank you @haohuaijin and @jackwener -- let's merge this PR and we can work on #8910 as a follow on

@alamb alamb merged commit c9935ae into apache:main Jan 22, 2024
24 checks passed
@haohuaijin haohuaijin deleted the fix branch January 23, 2024 00:27
@haohuaijin haohuaijin restored the fix branch January 23, 2024 04:37
@haohuaijin haohuaijin deleted the fix branch January 23, 2024 04:41
joroKr21 pushed a commit to coralogix/arrow-datafusion that referenced this pull request Apr 4, 2024
…expression (apache#8928)

* fix: common_subexpr_eliminate rule should not apply to short-circuit expression

* add more tests

* format

* minor

* apply reviews

* add some commont

* fmt
joroKr21 added a commit to coralogix/arrow-datafusion that referenced this pull request Apr 4, 2024
… to v34 (#227)

* fix: don't extract common sub expr in `CASE WHEN` clause (apache#8833)

* fix: don't extract common sub expr in CASE WHEN clause

* fix ci

* fix

* fix: common_subexpr_eliminate rule should not apply to short-circuit expression (apache#8928)

* fix: common_subexpr_eliminate rule should not apply to short-circuit expression

* add more tests

* format

* minor

* apply reviews

* add some commont

* fmt

---------

Co-authored-by: Huaijin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shouldn't extract common sub expression for short-circuited operation.
3 participants