-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13427][SQL] Support USING clause in JOIN. #11297
Conversation
@rxin @adrian-wang Can you please review the implementation and let me know your comments.Thanks !! |
@rxin Hi Reynold, would appreciate your feedback on this PR. Thanks a lot. |
@dilipbiswal Could you please resolve the conflicts? Thanks! |
/** | ||
* Holds the name of columns referenced in an USING clause of JOIN source. | ||
*/ | ||
case class UnresolvedUsingAttributes(usingCols: Seq[String]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not store the information in the join type like we do for NaturalJoin
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marmbrus Thank you Michael. Good idea. I will change it.
bdfe5ed
to
0c7c852
Compare
@marmbrus Hi Michael, i have changed the code per your feedback. As part of this change, i have also cleaned up dataframe's using join to make use of the new code. Please let me know your comments. Thanks !! |
@gatorsmile Thanks !! I have rebased the code now. |
@@ -1169,6 +1170,7 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper { | |||
Join(newLeft, newRight, LeftOuter, newJoinCond) | |||
case FullOuter => f | |||
case NaturalJoin(_) => sys.error("Untransformed NaturalJoin node") | |||
case UsingJoin(_, _) => sys.error("Untransformed Using join node") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'm not really sure what the point of these extra checks is. Is it only to remove a warning? All kinds of things break in the optimizer if the plan is unresolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JoinType is sealed, so we need to put something in this pattern match
We should also modify the |
@after { gParent.popMsg(state); } | ||
: KW_ON! expression | ||
| KW_USING LPAREN columnNameList RPAREN -> ^(TOK_USING columnNameList) | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @hvanhovell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good.
This looks pretty good. Thanks for working on it! Can you also update the the function in Dataset to use this code path instead of doing resolution there. |
Oh, you already did :) Ignore that. |
ok to test |
Test build #53145 has finished for PR 11297 at commit
|
f7bb37e
to
f39f547
Compare
Test build #53154 has finished for PR 11297 at commit
|
cc @marmbrus |
val joinPairs = leftKeys.zip(rightKeys) | ||
|
||
// Add joinPairs to joinConditions | ||
val newCondition = (condition ++ joinPairs.map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
joinPairs.map(EqualTo)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvanhovell Thank you for your review !! I have made the change.
Test build #53397 has finished for PR 11297 at commit
|
LGTM. Thanks! Merging to master. |
@marmbrus Thank you very much Michael !! |
## What changes were proposed in this pull request? Support queries that JOIN tables with USING clause. SELECT * from table1 JOIN table2 USING <column_list> USING clause can be used as a means to simplify the join condition when : 1) Equijoin semantics is desired and 2) The column names in the equijoin have the same name. We already have the support for Natural Join in Spark. This PR makes use of the already existing infrastructure for natural join to form the join condition and also the projection list. ## How was the this patch tested? Have added unit tests in SQLQuerySuite, CatalystQlSuite, ResolveNaturalJoinSuite Author: Dilip Biswal <[email protected]> Closes apache#11297 from dilipbiswal/spark-13427.
What changes were proposed in this pull request?
Support queries that JOIN tables with USING clause.
SELECT * from table1 JOIN table2 USING <column_list>
USING clause can be used as a means to simplify the join condition
when :
We already have the support for Natural Join in Spark. This PR makes
use of the already existing infrastructure for natural join to
form the join condition and also the projection list.
How was the this patch tested?
Have added unit tests in SQLQuerySuite, CatalystQlSuite, ResolveNaturalJoinSuite