-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36280][SQL] Remove redundant aliases after RewritePredicateSubquery #33509
Conversation
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #141596 has finished for PR 33509 at commit
|
sql/core/src/test/scala/org/apache/spark/sql/RewritePredicateSubqueryEndToEndSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. (with one minor comment).
cc @maropu
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #141764 has finished for PR 33509 at commit
|
import org.apache.spark.sql.execution.exchange.ShuffleExchangeExec | ||
import org.apache.spark.sql.test.SharedSparkSession | ||
|
||
class RewritePredicateSubqueryEndToEndSuite extends QueryTest with SharedSparkSession |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @wangyum I am curious why a separate "EndToEndSuite" is needed instead of adding this test to SubquerySuite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change focus on PredicateSubquery. If you mind, we can move to SubquerySuite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to have all subquery related tests centralized in SubquerySuite :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #141919 has finished for PR 33509 at commit
|
| ORDER BY Sum(y) DESC) AS ranking | ||
| FROM t2 | ||
| GROUP BY x) tmp1 | ||
| WHERE ranking <= 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use https://www.dpriver.com/pp/sqlformat.htm
style in our other test cases? Is there a reason why you choose this formatter, @wangyum ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have used this tool for many years. or do you have a recommended tool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have a standard about the SQL format in Spark right now. This format looks fine, except for the SELECT x ...many spaces... AS x
@@ -1876,4 +1877,29 @@ class SubquerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark | |||
"ReusedSubqueryExec should reuse an existing subquery") | |||
} | |||
} | |||
|
|||
test("SPARK-36280: Remove redundant aliases after RewritePredicateSubquery") { | |||
sql("CREATE TABLE t1 USING parquet AS SELECT id AS a, id AS b, id AS c FROM range(10)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wrap with withTable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it. Sorry. I don't know why I forgot.
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #141980 has finished for PR 33509 at commit
|
What changes were proposed in this pull request?
Remove redundant aliases after
RewritePredicateSubquery
. For example:Before this PR:
After this PR:
Why are the changes needed?
Reduce shuffle to improve query performance. This change can benefit TPC-DS q70.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test.