Skip to content

Commit

Permalink
[SPARK-16508][SPARKR] doc updates and more CRAN check fixes
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

replace ``` ` ``` in code doc with `\code{thing}`
remove added `...` for drop(DataFrame)
fix remaining CRAN check warnings

## How was this patch tested?

create doc with knitr

junyangq

Author: Felix Cheung <[email protected]>

Closes #14734 from felixcheung/rdoccleanup.
  • Loading branch information
felixcheung authored and Felix Cheung committed Aug 22, 2016
1 parent 84770b5 commit 71afeee
Show file tree
Hide file tree
Showing 12 changed files with 119 additions and 114 deletions.
6 changes: 5 additions & 1 deletion R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Imports from base R
importFrom(methods, setGeneric, setMethod, setOldClass)
# Do not include stats:: "rpois", "runif" - causes error at runtime
importFrom("methods", "setGeneric", "setMethod", "setOldClass")
importFrom("methods", "is", "new", "signature", "show")
importFrom("stats", "gaussian", "setNames")
importFrom("utils", "download.file", "packageVersion", "untar")

# Disable native libraries till we figure out how to package it
# See SPARKR-7839
Expand Down
71 changes: 35 additions & 36 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ setMethod("explain",

#' isLocal
#'
#' Returns True if the `collect` and `take` methods can be run locally
#' Returns True if the \code{collect} and \code{take} methods can be run locally
#' (without any Spark executors).
#'
#' @param x A SparkDataFrame
Expand Down Expand Up @@ -182,7 +182,7 @@ setMethod("isLocal",
#' @param numRows the number of rows to print. Defaults to 20.
#' @param truncate whether truncate long strings. If \code{TRUE}, strings more than
#' 20 characters will be truncated. However, if set greater than zero,
#' truncates strings longer than `truncate` characters and all cells
#' truncates strings longer than \code{truncate} characters and all cells
#' will be aligned right.
#' @param ... further arguments to be passed to or from other methods.
#' @family SparkDataFrame functions
Expand Down Expand Up @@ -642,10 +642,10 @@ setMethod("unpersist",
#' The following options for repartition are possible:
#' \itemize{
#' \item{1.} {Return a new SparkDataFrame partitioned by
#' the given columns into `numPartitions`.}
#' \item{2.} {Return a new SparkDataFrame that has exactly `numPartitions`.}
#' the given columns into \code{numPartitions}.}
#' \item{2.} {Return a new SparkDataFrame that has exactly \code{numPartitions}.}
#' \item{3.} {Return a new SparkDataFrame partitioned by the given column(s),
#' using `spark.sql.shuffle.partitions` as number of partitions.}
#' using \code{spark.sql.shuffle.partitions} as number of partitions.}
#'}
#' @param x a SparkDataFrame.
#' @param numPartitions the number of partitions to use.
Expand Down Expand Up @@ -1132,9 +1132,8 @@ setMethod("take",

#' Head
#'
#' Return the first NUM rows of a SparkDataFrame as a R data.frame. If NUM is NULL,
#' then head() returns the first 6 rows in keeping with the current data.frame
#' convention in R.
#' Return the first \code{num} rows of a SparkDataFrame as a R data.frame. If \code{num} is not
#' specified, then head() returns the first 6 rows as with R data.frame.
#'
#' @param x a SparkDataFrame.
#' @param num the number of rows to return. Default is 6.
Expand Down Expand Up @@ -1406,11 +1405,11 @@ setMethod("dapplyCollect",
#'
#' @param cols grouping columns.
#' @param func a function to be applied to each group partition specified by grouping
#' column of the SparkDataFrame. The function `func` takes as argument
#' column of the SparkDataFrame. The function \code{func} takes as argument
#' a key - grouping columns and a data frame - a local R data.frame.
#' The output of `func` is a local R data.frame.
#' The output of \code{func} is a local R data.frame.
#' @param schema the schema of the resulting SparkDataFrame after the function is applied.
#' The schema must match to output of `func`. It has to be defined for each
#' The schema must match to output of \code{func}. It has to be defined for each
#' output column with preferred output column name and corresponding data type.
#' @return A SparkDataFrame.
#' @family SparkDataFrame functions
Expand Down Expand Up @@ -1497,9 +1496,9 @@ setMethod("gapply",
#'
#' @param cols grouping columns.
#' @param func a function to be applied to each group partition specified by grouping
#' column of the SparkDataFrame. The function `func` takes as argument
#' column of the SparkDataFrame. The function \code{func} takes as argument
#' a key - grouping columns and a data frame - a local R data.frame.
#' The output of `func` is a local R data.frame.
#' The output of \code{func} is a local R data.frame.
#' @return A data.frame.
#' @family SparkDataFrame functions
#' @aliases gapplyCollect,SparkDataFrame-method
Expand Down Expand Up @@ -1657,7 +1656,7 @@ setMethod("$", signature(x = "SparkDataFrame"),
getColumn(x, name)
})

#' @param value a Column or NULL. If NULL, the specified Column is dropped.
#' @param value a Column or \code{NULL}. If \code{NULL}, the specified Column is dropped.
#' @rdname select
#' @name $<-
#' @aliases $<-,SparkDataFrame-method
Expand Down Expand Up @@ -1747,7 +1746,7 @@ setMethod("[", signature(x = "SparkDataFrame"),
#' @family subsetting functions
#' @examples
#' \dontrun{
#' # Columns can be selected using `[[` and `[`
#' # Columns can be selected using [[ and [
#' df[[2]] == df[["age"]]
#' df[,2] == df[,"age"]
#' df[,c("name", "age")]
Expand Down Expand Up @@ -1792,7 +1791,7 @@ setMethod("subset", signature(x = "SparkDataFrame"),
#' select(df, df$name, df$age + 1)
#' select(df, c("col1", "col2"))
#' select(df, list(df$name, df$age + 1))
#' # Similar to R data frames columns can also be selected using `$`
#' # Similar to R data frames columns can also be selected using $
#' df[,df$age]
#' }
#' @note select(SparkDataFrame, character) since 1.4.0
Expand Down Expand Up @@ -2443,7 +2442,7 @@ generateAliasesForIntersectedCols <- function (x, intersectedColNames, suffix) {
#' Return a new SparkDataFrame containing the union of rows
#'
#' Return a new SparkDataFrame containing the union of rows in this SparkDataFrame
#' and another SparkDataFrame. This is equivalent to `UNION ALL` in SQL.
#' and another SparkDataFrame. This is equivalent to \code{UNION ALL} in SQL.
#' Note that this does not remove duplicate rows across the two SparkDataFrames.
#'
#' @param x A SparkDataFrame
Expand Down Expand Up @@ -2486,7 +2485,7 @@ setMethod("unionAll",

#' Union two or more SparkDataFrames
#'
#' Union two or more SparkDataFrames. This is equivalent to `UNION ALL` in SQL.
#' Union two or more SparkDataFrames. This is equivalent to \code{UNION ALL} in SQL.
#' Note that this does not remove duplicate rows across the two SparkDataFrames.
#'
#' @param x a SparkDataFrame.
Expand Down Expand Up @@ -2519,7 +2518,7 @@ setMethod("rbind",
#' Intersect
#'
#' Return a new SparkDataFrame containing rows only in both this SparkDataFrame
#' and another SparkDataFrame. This is equivalent to `INTERSECT` in SQL.
#' and another SparkDataFrame. This is equivalent to \code{INTERSECT} in SQL.
#'
#' @param x A SparkDataFrame
#' @param y A SparkDataFrame
Expand Down Expand Up @@ -2547,7 +2546,7 @@ setMethod("intersect",
#' except
#'
#' Return a new SparkDataFrame containing rows in this SparkDataFrame
#' but not in another SparkDataFrame. This is equivalent to `EXCEPT` in SQL.
#' but not in another SparkDataFrame. This is equivalent to \code{EXCEPT} in SQL.
#'
#' @param x a SparkDataFrame.
#' @param y a SparkDataFrame.
Expand Down Expand Up @@ -2576,8 +2575,8 @@ setMethod("except",

#' Save the contents of SparkDataFrame to a data source.
#'
#' The data source is specified by the `source` and a set of options (...).
#' If `source` is not specified, the default data source configured by
#' The data source is specified by the \code{source} and a set of options (...).
#' If \code{source} is not specified, the default data source configured by
#' spark.sql.sources.default will be used.
#'
#' Additionally, mode is used to specify the behavior of the save operation when data already
Expand Down Expand Up @@ -2613,7 +2612,7 @@ setMethod("except",
#' @note write.df since 1.4.0
setMethod("write.df",
signature(df = "SparkDataFrame", path = "character"),
function(df, path, source = NULL, mode = "error", ...){
function(df, path, source = NULL, mode = "error", ...) {
if (is.null(source)) {
source <- getDefaultSqlSource()
}
Expand All @@ -2635,14 +2634,14 @@ setMethod("write.df",
#' @note saveDF since 1.4.0
setMethod("saveDF",
signature(df = "SparkDataFrame", path = "character"),
function(df, path, source = NULL, mode = "error", ...){
function(df, path, source = NULL, mode = "error", ...) {
write.df(df, path, source, mode, ...)
})

#' Save the contents of the SparkDataFrame to a data source as a table
#'
#' The data source is specified by the `source` and a set of options (...).
#' If `source` is not specified, the default data source configured by
#' The data source is specified by the \code{source} and a set of options (...).
#' If \code{source} is not specified, the default data source configured by
#' spark.sql.sources.default will be used.
#'
#' Additionally, mode is used to specify the behavior of the save operation when
Expand Down Expand Up @@ -2675,7 +2674,7 @@ setMethod("saveDF",
#' @note saveAsTable since 1.4.0
setMethod("saveAsTable",
signature(df = "SparkDataFrame", tableName = "character"),
function(df, tableName, source = NULL, mode="error", ...){
function(df, tableName, source = NULL, mode="error", ...) {
if (is.null(source)) {
source <- getDefaultSqlSource()
}
Expand Down Expand Up @@ -2752,11 +2751,11 @@ setMethod("summary",
#' @param how "any" or "all".
#' if "any", drop a row if it contains any nulls.
#' if "all", drop a row only if all its values are null.
#' if minNonNulls is specified, how is ignored.
#' if \code{minNonNulls} is specified, how is ignored.
#' @param minNonNulls if specified, drop rows that have less than
#' minNonNulls non-null values.
#' \code{minNonNulls} non-null values.
#' This overwrites the how parameter.
#' @param cols optional list of column names to consider. In `fillna`,
#' @param cols optional list of column names to consider. In \code{fillna},
#' columns specified in cols that do not have matching data
#' type are ignored. For example, if value is a character, and
#' subset contains a non-character column, then the non-character
Expand Down Expand Up @@ -2879,8 +2878,8 @@ setMethod("fillna",
#' in your system to accommodate the contents.
#'
#' @param x a SparkDataFrame.
#' @param row.names NULL or a character vector giving the row names for the data frame.
#' @param optional If `TRUE`, converting column names is optional.
#' @param row.names \code{NULL} or a character vector giving the row names for the data frame.
#' @param optional If \code{TRUE}, converting column names is optional.
#' @param ... additional arguments to pass to base::as.data.frame.
#' @return A data.frame.
#' @family SparkDataFrame functions
Expand Down Expand Up @@ -3058,7 +3057,7 @@ setMethod("str",
#' @note drop since 2.0.0
setMethod("drop",
signature(x = "SparkDataFrame"),
function(x, col, ...) {
function(x, col) {
stopifnot(class(col) == "character" || class(col) == "Column")

if (class(col) == "Column") {
Expand Down Expand Up @@ -3218,8 +3217,8 @@ setMethod("histogram",
#' and to not change the existing data.
#' }
#'
#' @param x s SparkDataFrame.
#' @param url JDBC database url of the form `jdbc:subprotocol:subname`.
#' @param x a SparkDataFrame.
#' @param url JDBC database url of the form \code{jdbc:subprotocol:subname}.
#' @param tableName yhe name of the table in the external database.
#' @param mode one of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default).
#' @param ... additional JDBC database connection properties.
Expand All @@ -3237,7 +3236,7 @@ setMethod("histogram",
#' @note write.jdbc since 2.0.0
setMethod("write.jdbc",
signature(x = "SparkDataFrame", url = "character", tableName = "character"),
function(x, url, tableName, mode = "error", ...){
function(x, url, tableName, mode = "error", ...) {
jmode <- convertToJSaveMode(mode)
jprops <- varargsToJProperties(...)
write <- callJMethod(x@sdf, "write")
Expand Down
10 changes: 5 additions & 5 deletions R/pkg/R/RDD.R
Original file line number Diff line number Diff line change
Expand Up @@ -887,17 +887,17 @@ setMethod("sampleRDD",

# Discards some random values to ensure each partition has a
# different random seed.
runif(partIndex)
stats::runif(partIndex)

for (elem in part) {
if (withReplacement) {
count <- rpois(1, fraction)
count <- stats::rpois(1, fraction)
if (count > 0) {
res[ (len + 1) : (len + count) ] <- rep(list(elem), count)
len <- len + count
}
} else {
if (runif(1) < fraction) {
if (stats::runif(1) < fraction) {
len <- len + 1
res[[len]] <- elem
}
Expand Down Expand Up @@ -965,15 +965,15 @@ setMethod("takeSample", signature(x = "RDD", withReplacement = "logical",

set.seed(seed)
samples <- collectRDD(sampleRDD(x, withReplacement, fraction,
as.integer(ceiling(runif(1,
as.integer(ceiling(stats::runif(1,
-MAXINT,
MAXINT)))))
# If the first sample didn't turn out large enough, keep trying to
# take samples; this shouldn't happen often because we use a big
# multiplier for thei initial size
while (length(samples) < total)
samples <- collectRDD(sampleRDD(x, withReplacement, fraction,
as.integer(ceiling(runif(1,
as.integer(ceiling(stats::runif(1,
-MAXINT,
MAXINT)))))

Expand Down
30 changes: 15 additions & 15 deletions R/pkg/R/SQLContext.R
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ infer_type <- function(x) {
#' Get Runtime Config from the current active SparkSession
#'
#' Get Runtime Config from the current active SparkSession.
#' To change SparkSession Runtime Config, please see `sparkR.session()`.
#' To change SparkSession Runtime Config, please see \code{sparkR.session()}.
#'
#' @param key (optional) The key of the config to get, if omitted, all config is returned
#' @param defaultValue (optional) The default value of the config to return if they config is not
Expand Down Expand Up @@ -720,11 +720,11 @@ dropTempView <- function(viewName) {
#'
#' Returns the dataset in a data source as a SparkDataFrame
#'
#' The data source is specified by the `source` and a set of options(...).
#' If `source` is not specified, the default data source configured by
#' The data source is specified by the \code{source} and a set of options(...).
#' If \code{source} is not specified, the default data source configured by
#' "spark.sql.sources.default" will be used. \cr
#' Similar to R read.csv, when `source` is "csv", by default, a value of "NA" will be interpreted
#' as NA.
#' Similar to R read.csv, when \code{source} is "csv", by default, a value of "NA" will be
#' interpreted as NA.
#'
#' @param path The path of files to load
#' @param source The name of external data source
Expand Down Expand Up @@ -791,8 +791,8 @@ loadDF <- function(x, ...) {
#' Creates an external table based on the dataset in a data source,
#' Returns a SparkDataFrame associated with the external table.
#'
#' The data source is specified by the `source` and a set of options(...).
#' If `source` is not specified, the default data source configured by
#' The data source is specified by the \code{source} and a set of options(...).
#' If \code{source} is not specified, the default data source configured by
#' "spark.sql.sources.default" will be used.
#'
#' @param tableName a name of the table.
Expand Down Expand Up @@ -830,22 +830,22 @@ createExternalTable <- function(x, ...) {
#' Additional JDBC database connection properties can be set (...)
#'
#' Only one of partitionColumn or predicates should be set. Partitions of the table will be
#' retrieved in parallel based on the `numPartitions` or by the predicates.
#' retrieved in parallel based on the \code{numPartitions} or by the predicates.
#'
#' Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
#' your external database systems.
#'
#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
#' @param url JDBC database url of the form \code{jdbc:subprotocol:subname}
#' @param tableName the name of the table in the external database
#' @param partitionColumn the name of a column of integral type that will be used for partitioning
#' @param lowerBound the minimum value of `partitionColumn` used to decide partition stride
#' @param upperBound the maximum value of `partitionColumn` used to decide partition stride
#' @param numPartitions the number of partitions, This, along with `lowerBound` (inclusive),
#' `upperBound` (exclusive), form partition strides for generated WHERE
#' clause expressions used to split the column `partitionColumn` evenly.
#' @param lowerBound the minimum value of \code{partitionColumn} used to decide partition stride
#' @param upperBound the maximum value of \code{partitionColumn} used to decide partition stride
#' @param numPartitions the number of partitions, This, along with \code{lowerBound} (inclusive),
#' \code{upperBound} (exclusive), form partition strides for generated WHERE
#' clause expressions used to split the column \code{partitionColumn} evenly.
#' This defaults to SparkContext.defaultParallelism when unset.
#' @param predicates a list of conditions in the where clause; each one defines one partition
#' @param ... additional JDBC database connection named propertie(s).
#' @param ... additional JDBC database connection named properties.
#' @return SparkDataFrame
#' @rdname read.jdbc
#' @name read.jdbc
Expand Down
Loading

0 comments on commit 71afeee

Please sign in to comment.