-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33400][SQL] Normalize sameOrderExpressions in SortOrder to avoid unnecessary sort operations #30302
Conversation
ed7c7a8
to
123512d
Compare
This is related to #30300 ? |
123512d
to
bd47570
Compare
cc - @cloud-fan @imback82 |
ok to test |
LGTM if the existing tests pass. |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #131235 has finished for PR 30302 at commit
|
sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputExpression.scala
Show resolved
Hide resolved
@maropu @cloud-fan Thanks for reviewing the changes. Please merge the changes if there are no further comments. I will work on a followup PR to make sameOrderExpressions child of SortOrder. |
thanks, merging to master! |
### What changes were proposed in this pull request? This is a followup of #30302 . As part of this PR, sameOrderExpressions set is made part of children of SortOrder node - so that they don't need any special handling as done in #30302 . ### Why are the changes needed? sameOrderExpressions should get same treatment as child. So making them part of children helps in transforming them easily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UTs Closes #30430 from prakharjain09/SPARK-33400-sortorder-refactor. Authored-by: Prakhar Jain <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
What changes were proposed in this pull request?
This pull request tries to normalize the SortOrder properly to prevent unnecessary sort operators. Currently the sameOrderExpressions are not normalized as part of AliasAwareOutputOrdering.
Example: consider this join of three tables:
The plan for this looks like:
In this plan, the marked sort node could have been avoided as the data is already sorted on "t2.id" by the lower SortMergeJoin.
Why are the changes needed?
To remove unneeded Sort operators.
Does this PR introduce any user-facing change?
No
How was this patch tested?
New UT added.