Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37474][R][DOCS] Migrate SparkR docs to pkgdown #34728

Closed
wants to merge 14 commits into from
Closed
5 changes: 4 additions & 1 deletion .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -479,7 +479,8 @@ jobs:
- name: Install dependencies for documentation generation
run: |
# pandoc is required to generate PySpark APIs as well in nbsphinx.
apt-get install -y libcurl4-openssl-dev pandoc
apt-get install -y libcurl4-openssl-dev pandoc libfontconfig1-dev libharfbuzz-dev \
libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev
# TODO(SPARK-32407): Sphinx 3.1+ does not correctly index nested classes.
# See also https://github.com/sphinx-doc/sphinx/issues/7551.
# Jinja2 3.0.0+ causes error when building with Sphinx.
Expand All @@ -489,6 +490,8 @@ jobs:
apt-get update -y
apt-get install -y ruby ruby-dev
Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2'), repos='https://cloud.r-project.org/')"
Rscript -e "devtools::install_git('https://github.com/r-lib/pkgdown.git')"
Rscript -e "devtools::install_git('https://github.com/amirmasoudabdol/preferably.git')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick question, do we need this for documentation build? we would have to also update https://github.com/apache/spark/blob/master/docs/README.md and https://github.com/apache/spark/blob/master/dev/create-release/spark-rm/Dockerfile#L83-L84 . Can be done separately though.

And can we pin the versions of both libraries instead of using the master branch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick question, do we need this for documentation build?

Yes, documentation build invokes pkgdown.

And can we pin the versions of both libraries instead of using the master branch?

Of course. We should be even able to pin CRAN version of pkgdown (seems like 2.0.x and compatible preferably)) are already published and visible).

gem install bundler
cd docs
bundle install
Expand Down
3 changes: 3 additions & 0 deletions R/create-docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
# $SPARK_HOME/R/pkg/html
# The vignettes can be found in
# $SPARK_HOME/R/pkg/vignettes/sparkr_vignettes.html
# pkgdown website can be found in
# $SPARK_HOME/R/pkg/docs

set -o pipefail
set -e
Expand All @@ -50,6 +52,7 @@ mkdir -p pkg/html
pushd pkg/html

"$R_SCRIPT_PATH/Rscript" -e 'libDir <- "../../lib"; library(SparkR, lib.loc=libDir); knitr::knit_rd("SparkR", links = tools::findHTMLlinks(file.path(libDir, "SparkR")))'
"$R_SCRIPT_PATH/Rscript" -e 'libDir <- "../../lib"; library(SparkR, lib.loc=libDir); pkgdown::build_site("..")'

popd

Expand Down
3 changes: 3 additions & 0 deletions R/pkg/.Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,6 @@
^src-native$
^html$
^tests/fulltests/*
^_pkgdown\.yml$
^docs$
^pkgdown$
1 change: 1 addition & 0 deletions R/pkg/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
docs
2 changes: 1 addition & 1 deletion R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Collate:
'types.R'
'utils.R'
'window.R'
RoxygenNote: 7.1.1
RoxygenNote: 7.1.2
VignetteBuilder: knitr
NeedsCompilation: no
Encoding: UTF-8
31 changes: 13 additions & 18 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -890,10 +890,9 @@ setMethod("toJSON",
#' save mode (it is 'error' by default)
#' @param ... additional argument(s) passed to the method.
#' You can find the JSON-specific options for writing JSON files in
#' \url{
#' https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option}{
#' Data Source Option} in the version you use.
#'
# nolint start
#' \url{https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option}{Data Source Option} in the version you use.
# nolint end
#' @family SparkDataFrame functions
#' @rdname write.json
#' @name write.json
Expand Down Expand Up @@ -925,10 +924,9 @@ setMethod("write.json",
#' save mode (it is 'error' by default)
#' @param ... additional argument(s) passed to the method.
#' You can find the ORC-specific options for writing ORC files in
#' \url{
#' https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-option}{
#' Data Source Option} in the version you use.
#'
# nolint start
#' \url{https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-option}{Data Source Option} in the version you use.
# nolint end
#' @family SparkDataFrame functions
#' @aliases write.orc,SparkDataFrame,character-method
#' @rdname write.orc
Expand Down Expand Up @@ -960,10 +958,9 @@ setMethod("write.orc",
#' save mode (it is 'error' by default)
#' @param ... additional argument(s) passed to the method.
#' You can find the Parquet-specific options for writing Parquet files in
#' \url{
#' https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option
#' }{Data Source Option} in the version you use.
#'
# nolint start
#' \url{https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option}{Data Source Option} in the version you use.
# nolint end
#' @family SparkDataFrame functions
#' @rdname write.parquet
#' @name write.parquet
Expand Down Expand Up @@ -996,10 +993,9 @@ setMethod("write.parquet",
#' save mode (it is 'error' by default)
#' @param ... additional argument(s) passed to the method.
#' You can find the text-specific options for writing text files in
#' \url{
#' https://spark.apache.org/docs/latest/sql-data-sources-text.html#data-source-option}{
#' Data Source Option} in the version you use.
#'
# nolint start
#' \url{https://spark.apache.org/docs/latest/sql-data-sources-text.html#data-source-option}{Data Source Option} in the version you use.
# nolint end
#' @family SparkDataFrame functions
#' @aliases write.text,SparkDataFrame,character-method
#' @rdname write.text
Expand Down Expand Up @@ -3912,8 +3908,7 @@ setMethod("isStreaming",
#' @aliases write.stream,SparkDataFrame-method
#' @rdname write.stream
#' @name write.stream
#' @examples
#'\dontrun{
#' @examples \dontrun{
#' sparkR.session()
#' df <- read.stream("socket", host = "localhost", port = 9999)
#' isStreaming(df)
Expand Down
29 changes: 15 additions & 14 deletions R/pkg/R/SQLContext.R
Original file line number Diff line number Diff line change
Expand Up @@ -382,9 +382,9 @@ setMethod("toDF", signature(x = "RDD"),
#' @param path Path of file to read. A vector of multiple paths is allowed.
#' @param ... additional external data source specific named properties.
#' You can find the JSON-specific options for reading JSON files in
#' \url{
#' https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option}{
#' Data Source Option} in the version you use.
# nolint start
#' \url{https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option}{Data Source Option} in the version you use.
# nolint end
#' @return SparkDataFrame
#' @rdname read.json
#' @examples
Expand Down Expand Up @@ -414,9 +414,9 @@ read.json <- function(path, ...) {
#' @param path Path of file to read.
#' @param ... additional external data source specific named properties.
#' You can find the ORC-specific options for reading ORC files in
#' \url{
#' https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-option}{
#' Data Source Option} in the version you use.
# nolint start
#' \url{https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-option}{Data Source Option} in the version you use.
# nolint end
#' @return SparkDataFrame
#' @rdname read.orc
#' @name read.orc
Expand All @@ -439,9 +439,9 @@ read.orc <- function(path, ...) {
#' @param path path of file to read. A vector of multiple paths is allowed.
#' @param ... additional data source specific named properties.
#' You can find the Parquet-specific options for reading Parquet files in
#' \url{
#' https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option
#' }{Data Source Option} in the version you use.
# nolint start
#' \url{https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option}{Data Source Option} in the version you use.
# nolint end
#' @return SparkDataFrame
#' @rdname read.parquet
#' @name read.parquet
Expand All @@ -468,9 +468,9 @@ read.parquet <- function(path, ...) {
#' @param path Path of file to read. A vector of multiple paths is allowed.
#' @param ... additional external data source specific named properties.
#' You can find the text-specific options for reading text files in
#' \url{
#' https://spark.apache.org/docs/latest/sql-data-sources-text.html#data-source-option}{
#' Data Source Option} in the version you use.
# nolint start
#' \url{https://spark.apache.org/docs/latest/sql-data-sources-text.html#data-source-option}{Data Source Option} in the version you use.
# nolint end
#' @return SparkDataFrame
#' @rdname read.text
#' @examples
Expand Down Expand Up @@ -619,8 +619,9 @@ loadDF <- function(path = NULL, source = NULL, schema = NULL, ...) {
#'
#' Additional JDBC database connection properties can be set (...)
#' You can find the JDBC-specific option and parameter documentation for reading tables via JDBC in
#' \url{https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#data-source-option}{
#' Data Source Option} in the version you use.
# nolint start
#' \url{https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#data-source-option}{Data Source Option} in the version you use.
# nolint end
#'
#' Only one of partitionColumn or predicates should be set. Partitions of the table will be
#' retrieved in parallel based on the \code{numPartitions} or by the predicates.
Expand Down
17 changes: 11 additions & 6 deletions R/pkg/R/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -264,18 +264,20 @@ NULL
#' additional named properties to control how it is converted and accepts the
#' same options as the JSON data source.
#' You can find the JSON-specific options for reading/writing JSON files in
#' \url{
#' https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option}{
#' Data Source Option} in the version you use.
# nolint start
#' \url{https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option}{Data Source Option}
# nolint end
#' in the version you use.
#' \item \code{to_json}: it supports the "pretty" option which enables pretty
#' JSON generation.
#' \item \code{to_csv}, \code{from_csv} and \code{schema_of_csv}: this contains
#' additional named properties to control how it is converted and accepts the
#' same options as the CSV data source.
#' You can find the CSV-specific options for reading/writing CSV files in
#' \url{
#' https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option}{
#' Data Source Option} in the version you use.
# nolint start
#' \url{https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option}{Data Source Option}
# nolint end
#' in the version you use.
#' \item \code{arrays_zip}, this contains additional Columns of arrays to be merged.
#' \item \code{map_concat}, this contains additional Columns of maps to be unioned.
#' }
Expand Down Expand Up @@ -3816,6 +3818,7 @@ setMethod("row_number",
#' Column, for example \code{unresolved_named_lambda_var("a", "b", "c")}
#' yields unresolved \code{a.b.c}
#' @return Column object wrapping JVM UnresolvedNamedLambdaVariable
#' @keywords internal
unresolved_named_lambda_var <- function(...) {
jc <- newJObject(
"org.apache.spark.sql.Column",
Expand All @@ -3839,6 +3842,7 @@ unresolved_named_lambda_var <- function(...) {
#' @param fun R \code{function} (unary, binary or ternary)
#' that transforms \code{Columns} into a \code{Column}
#' @return JVM \code{LambdaFunction} object
#' @keywords internal
create_lambda <- function(fun) {
as_jexpr <- function(x) callJMethod(x@jc, "expr")

Expand Down Expand Up @@ -3887,6 +3891,7 @@ create_lambda <- function(fun) {
#' @param cols list of character or Column objects
#' @param funs list of named list(fun = ..., expected_narg = ...)
#' @return a \code{Column} representing name applied to cols with funs
#' @keywords internal
invoke_higher_order_function <- function(name, cols, funs) {
as_jexpr <- function(x) {
if (class(x) == "character") {
Expand Down
1 change: 1 addition & 0 deletions R/pkg/R/jobj.R
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ jobj <- function(objId) {
#' @param x The JVM object reference
#' @param ... further arguments passed to or from other methods
#' @note print.jobj since 1.4.0
#' @keywords internal
print.jobj <- function(x, ...) {
name <- getClassName.jobj(x)
cat("Java ref type", name, "id", x$id, "\n", sep = " ")
Expand Down
2 changes: 2 additions & 0 deletions R/pkg/R/schema.R
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ structType.character <- function(x, ...) {
#' @param x A StructType object
#' @param ... further arguments passed to or from other methods
#' @note print.structType since 1.4.0
#' @keywords internal
print.structType <- function(x, ...) {
cat("StructType\n",
sapply(x$fields(),
Expand Down Expand Up @@ -234,6 +235,7 @@ structField.character <- function(x, type, nullable = TRUE, ...) {
#' @param x A StructField object
#' @param ... further arguments passed to or from other methods
#' @note print.structField since 1.4.0
#' @keywords internal
print.structField <- function(x, ...) {
cat("StructField(name = \"", x$name(),
"\", type = \"", x$dataType.toString(),
Expand Down
1 change: 1 addition & 0 deletions R/pkg/R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ isRDD <- function(name, env) {
#' hashCode("1") # 49
#'}
#' @note hashCode since 1.4.0
#' @keywords internal
hashCode <- function(key) {
if (class(key) == "integer") {
as.integer(key[[1]])
Expand Down
File renamed without changes.
Loading