-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4937][SQL] Adding optimization to simplify the And, Or condition in spark sql #3778
Conversation
Test build #24738 has started for PR 3778 at commit
|
Test build #24738 has finished for PR 3778 at commit
|
Test FAILed. |
Test build #24739 has started for PR 3778 at commit
|
Test build #24739 has finished for PR 3778 at commit
|
Test FAILed. |
Hey @scwf, I'm kinda lost in all the predicates provided in the PR description... Would you mind to provide several simpler cases to illustrate what kind of optimizations this PR enables? I guess essentially the second case is this:
Is it correct? |
Yes, correct, later i will give a detail description for this:) |
change to WIP, todos: |
@scwf These optimizations are useful, particularly the one that eliminates common predicates. Thanks for bringing them up! However, the implementation in this PR is really over complicated... I opened #3784 for a simpler version. For the numeric comparisons, I think we can first cast all numeric comparisons to |
Test build #24768 has started for PR 3778 at commit
|
Test build #24773 has started for PR 3778 at commit
|
Test build #24768 has finished for PR 3778 at commit
|
Test FAILed. |
Test build #24773 has finished for PR 3778 at commit
|
Test FAILed. |
Test build #24804 has started for PR 3778 at commit
|
Test build #24804 has finished for PR 3778 at commit
|
Test PASSed. |
val result = compare(vLeft, vRight) | ||
result.filter(_ > 0).map(c => { | ||
if (((left.isLess || left.isLessEquals) | ||
&& (right.isLess || right.isLessEquals || right.isEquals))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually similar deeply nested if
statements can be refactored into carefully organized pattern matches. Using pattern matching also frees you from helper methods like .isLess
and isLessEquals
, thus ExpressionCookies
won't be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it, actually i have recognized this. but the case here really complex...:)
I am trying to refactory this
Hi, @marmbrus this is a more complete version than #3784 and @liancheng told me he will close #3784 in favor of this PR, but now you have merged #3784. So can you revert that one or Maybe should i change this with newly master branch? |
Oh I see, sorry that was not clear from the discussion. I think it would be best to do this as an augmentation to what is already merged in. |
Sure, updated! |
Test build #24917 has started for PR 3778 at commit
|
Test build #24917 has finished for PR 3778 at commit
|
Test PASSed. |
@marmbrus, any comments here? i think this is ok to go |
val batches = | ||
Batch("AnalysisNodes", Once, | ||
EliminateAnalysisOperators) :: | ||
Batch("Constant Folding", FixedPoint(50), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the indenting is off here.
This looks good to me. @liancheng have you looked this over too? |
Test build #25368 has started for PR 3778 at commit
|
Test build #25368 has finished for PR 3778 at commit
|
Test PASSed. |
hey @liancheng, any other comments here? |
Sorry for the late reply, this LGTM, thanks! |
Thanks, @marmbrus this should be ready to go. |
Thanks! Merged to master. |
// a && a => a | ||
case (l, r) if l fastEquals r => l | ||
case (_, _) => | ||
val lhsSet = splitDisjunctivePredicates(left).toSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this has been merged, but can you submit another PR to add comments explaining how this works? I'm afraid this block of code is beyond a normal engineer's capability to understand without spending hours staring at it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, later i can do this
…nSimplification` Follow up of #3778 /cc rxin Author: scwf <[email protected]> Closes #4086 from scwf/commentforspark-4937 and squashes the following commits: aaf89f6 [scwf] code style issue 2d3406e [scwf] added comment for spark-4937
Adding optimization to simplify the And/Or condition in spark sql.
There are two kinds of Optimization
1 Numeric condition optimization, such as:
a < 3 && a > 5 ---- False
a < 1 || a > 0 ---- True
a > 3 && a > 5 => a > 5
(a < 2 || b > 5) && a < 2 => a < 2
2 optimizing the some query from a cartesian product into equi-join, such as this sql (one of hive-testbench):
It has a repeated expression in Or, so we can optimize it by
data:image/s3,"s3://crabby-images/1ac4b/1ac4bf6b1298ae40bfd1df219dd010f9a393edb8" alt="image"
(a && b) || (a && c) = a && (b || c)
Before optimization, this sql hang in my locally test, and the physical plan is:
After optimization, this sql run successfully in 20+ seconds, and its physical plan is:
data:image/s3,"s3://crabby-images/99fcc/99fcc5163ae35468ae84f33dae8eb0b65b7ed3e0" alt="image"
This PR focus on the second optimization and some simple ones of the first. For complex Numeric condition optimization, I will make a follow up PR.