-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10446][SQL] Support to specify join type when calling join with usingColumns #8600
Conversation
Test build #41997 has finished for PR 8600 at commit
|
Test build #42001 has finished for PR 8600 at commit
|
* @group dfops | ||
* @since 1.4.0 | ||
*/ | ||
def join(right: DataFrame, usingColumns: Seq[String]): DataFrame = { | ||
def join(right: DataFrame, usingColumns: Seq[String], joinType: String = "inner"): DataFrame = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot use default parameter values in order to maintain compatibility with Java. You can add an extra method.
Test build #42058 has finished for PR 8600 at commit
|
ping @rxin |
Thanks - I've merged this. |
…i-Join After reading the JIRA https://issues.apache.org/jira/browse/SPARK-12520, I double checked the code. For example, users can do the Equi-Join like ```df.join(df2, 'name', 'outer').select('name', 'height').collect()``` - There exists a bug in 1.5 and 1.4. The code just ignores the third parameter (join type) users pass. However, the join type we called is `Inner`, even if the user-specified type is the other type (e.g., `Outer`). - After a PR: apache#8600, the 1.6 does not have such an issue, but the description has not been updated. Plan to submit another PR to fix 1.5 and issue an error message if users specify a non-inner join type when using Equi-Join. Author: gatorsmile <[email protected]> Closes apache#10477 from gatorsmile/pyOuterJoin.
…i-Join After reading the JIRA https://issues.apache.org/jira/browse/SPARK-12520, I double checked the code. For example, users can do the Equi-Join like ```df.join(df2, 'name', 'outer').select('name', 'height').collect()``` - There exists a bug in 1.5 and 1.4. The code just ignores the third parameter (join type) users pass. However, the join type we called is `Inner`, even if the user-specified type is the other type (e.g., `Outer`). - After a PR: #8600, the 1.6 does not have such an issue, but the description has not been updated. Plan to submit another PR to fix 1.5 and issue an error message if users specify a non-inner join type when using Equi-Join. Author: gatorsmile <[email protected]> Closes #10477 from gatorsmile/pyOuterJoin.
How can we combine two columns with different values? |
nvm. USING join can support outer join types, but we are unable to treat them as actual outer join. |
JIRA: https://issues.apache.org/jira/browse/SPARK-10446
Currently the method
join(right: DataFrame, usingColumns: Seq[String])
only supports inner join. It is more convenient to have it support other join types.