Skip to content

Commit

Permalink
[SPARK-19126][DOCS] Update Join Documentation Across Languages
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

- [X] Make sure all join types are clearly mentioned
- [X] Make join labeling/style consistent
- [X] Make join label ordering docs the same
- [X] Improve join documentation according to above for Scala
- [X] Improve join documentation according to above for Python
- [X] Improve join documentation according to above for R

## How was this patch tested?
No tests b/c docs.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: anabranch <[email protected]>

Closes #16504 from anabranch/SPARK-19126.
  • Loading branch information
bllchmbrs authored and Felix Cheung committed Jan 9, 2017
1 parent 1f6ded6 commit 19d9d4c
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 14 deletions.
19 changes: 11 additions & 8 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -2313,9 +2313,9 @@ setMethod("dropDuplicates",
#' @param joinExpr (Optional) The expression used to perform the join. joinExpr must be a
#' Column expression. If joinExpr is omitted, the default, inner join is attempted and an error is
#' thrown if it would be a Cartesian Product. For Cartesian join, use crossJoin instead.
#' @param joinType The type of join to perform. The following join types are available:
#' 'inner', 'outer', 'full', 'fullouter', leftouter', 'left_outer', 'left',
#' 'right_outer', 'rightouter', 'right', and 'leftsemi'. The default joinType is "inner".
#' @param joinType The type of join to perform, default 'inner'.
#' Must be one of: 'inner', 'cross', 'outer', 'full', 'full_outer',
#' 'left', 'left_outer', 'right', 'right_outer', 'left_semi', or 'left_anti'.
#' @return A SparkDataFrame containing the result of the join operation.
#' @family SparkDataFrame functions
#' @aliases join,SparkDataFrame,SparkDataFrame-method
Expand Down Expand Up @@ -2344,15 +2344,18 @@ setMethod("join",
if (is.null(joinType)) {
sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc)
} else {
if (joinType %in% c("inner", "outer", "full", "fullouter",
"leftouter", "left_outer", "left",
"rightouter", "right_outer", "right", "leftsemi")) {
if (joinType %in% c("inner", "cross",
"outer", "full", "fullouter", "full_outer",
"left", "leftouter", "left_outer",
"right", "rightouter", "right_outer",
"left_semi", "leftsemi", "left_anti", "leftanti")) {
joinType <- gsub("_", "", joinType)
sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType)
} else {
stop("joinType must be one of the following types: ",
"'inner', 'outer', 'full', 'fullouter', 'leftouter', 'left_outer', 'left',
'rightouter', 'right_outer', 'right', 'leftsemi'")
"'inner', 'cross', 'outer', 'full', 'full_outer',",
"'left', 'left_outer', 'right', 'right_outer',",
"'left_semi', or 'left_anti'.")
}
}
}
Expand Down
5 changes: 3 additions & 2 deletions python/pyspark/sql/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -730,8 +730,9 @@ def join(self, other, on=None, how=None):
a join expression (Column), or a list of Columns.
If `on` is a string or a list of strings indicating the name of the join column(s),
the column(s) must exist on both sides, and this performs an equi-join.
:param how: str, default 'inner'.
One of `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`.
:param how: str, default ``inner``. Must be one of: ``inner``, ``cross``, ``outer``,
``full``, ``full_outer``, ``left``, ``left_outer``, ``right``, ``right_outer``,
``left_semi``, and ``left_anti``.
The following performs a full outer join between ``df1`` and ``df2``.
Expand Down
16 changes: 12 additions & 4 deletions sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
Original file line number Diff line number Diff line change
Expand Up @@ -750,14 +750,18 @@ class Dataset[T] private[sql](
}

/**
* Equi-join with another `DataFrame` using the given columns.
* Equi-join with another `DataFrame` using the given columns. A cross join with a predicate
* is specified as an inner join. If you would explicitly like to perform a cross join use the
* `crossJoin` method.
*
* Different from other join functions, the join columns will only appear once in the output,
* i.e. similar to SQL's `JOIN USING` syntax.
*
* @param right Right side of the join operation.
* @param usingColumns Names of the columns to join on. This columns must exist on both sides.
* @param joinType One of: `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`.
* @param joinType Type of join to perform. Default `inner`. Must be one of:
* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`,
* `right`, `right_outer`, `left_semi`, `left_anti`.
*
* @note If you perform a self-join using this function without aliasing the input
* `DataFrame`s, you will NOT be able to reference any columns after the join, since
Expand Down Expand Up @@ -812,7 +816,9 @@ class Dataset[T] private[sql](
*
* @param right Right side of the join.
* @param joinExprs Join expression.
* @param joinType One of: `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`.
* @param joinType Type of join to perform. Default `inner`. Must be one of:
* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`,
* `right`, `right_outer`, `left_semi`, `left_anti`.
*
* @group untypedrel
* @since 2.0.0
Expand Down Expand Up @@ -889,7 +895,9 @@ class Dataset[T] private[sql](
*
* @param other Right side of the join.
* @param condition Join expression.
* @param joinType One of: `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`.
* @param joinType Type of join to perform. Default `inner`. Must be one of:
* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`,
* `right`, `right_outer`, `left_semi`, `left_anti`.
*
* @group typedrel
* @since 1.6.0
Expand Down

0 comments on commit 19d9d4c

Please sign in to comment.