Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19324][SPARKR] Spark VJM stdout output is getting dropped in SparkR #16670

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions R/pkg/R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -756,12 +756,17 @@ varargsToJProperties <- function(...) {
props
}

launchScript <- function(script, combinedArgs, capture = FALSE) {
launchScript <- function(script, combinedArgs, wait = FALSE) {
if (.Platform$OS.type == "windows") {
scriptWithArgs <- paste(script, combinedArgs, sep = " ")
shell(scriptWithArgs, translate = TRUE, wait = capture, intern = capture) # nolint
# on Windows, intern = F seems to mean output to the console. (documentation on this is missing)
shell(scriptWithArgs, translate = TRUE, wait = wait, intern = wait) # nolint
} else {
system2(script, combinedArgs, wait = capture, stdout = capture)
# http://stat.ethz.ch/R-manual/R-devel/library/base/html/system2.html
# stdout = F means discard output
# stdout = "" means to its console (default)
# Note that the console of this child process might not be the same as the running R process.
system2(script, combinedArgs, stdout = "", wait = wait)
}
}

Expand Down
2 changes: 1 addition & 1 deletion R/pkg/inst/tests/testthat/test_Windows.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ test_that("sparkJars tag in SparkContext", {
if (.Platform$OS.type != "windows") {
skip("This test is only for Windows, skipped")
}
testOutput <- launchScript("ECHO", "a/b/c", capture = TRUE)
testOutput <- launchScript("ECHO", "a/b/c", wait = TRUE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a similar test with something getting printed on stdout from the JVM ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could, but unfortunately, we don't actually call launchScript with wait/capture = TRUE
we call wait/capture = FALSE and expect to let console/stdout to leak through, and return NULL.

I'll try to add test for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I've tried, I don't think it would work.
When calling system2(.., wait = FALSE, capture = "") the output to stdout is actually from the child process, so I don't think we would be able to see it from the R process.
We could redirect it, but then it would be the same as system2(..., wait = FALSE, capture = TRUE) but again it wouldn't be what we are normally calling.

I think we would need to dig deeper on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - I expected the stdout = "" to be piping it to the R process stdout. We could also explicitly pass a fd to do this pipe ?

Copy link
Member Author

@felixcheung felixcheung Jan 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I observed running SparkR as a package, I'm not sure we should pipe/redirect the stdout always - it could get very noisy running SparkR from an IDE. On the other hand, often times the result (error) is not enough to debug the issue.

I'd propose we don't redirect stdout by default in keeping the IDE experience cleaner, however we should have an API to "turn this on and off" programmatically on demand. Although it is not clear system2 supports that though, stdout is either TRUE (capture to return as a character vector), NULL/FALSE (drop), "" (to the console of that child process), "name" (file name to write into)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of what in a R IDE see:

> head(p, 40)
Error in handleErrors(returnStatus, conn) :
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 70.0 failed 1 times, most recent failure: Lost task 0.0 in stage 70.0 (TID 115, localhost, executor driver): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$4: (string) => double)
       at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
       at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
       at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
       at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
       at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
       at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
       at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what they see before this change or after it ?

Copy link
Member Author

@felixcheung felixcheung Jan 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is from #16670 (comment) this is "often times the result (error) is not enough to debug the issue."

our options are:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the second option is fine - i.e not redirecting it by default but happening to share stdout with the R process. That way if say the R IDE has some way to save or view logs from the stdout of R, then users can use that (Does RStudio have something like this ?)

The API to redirect on demand might be useful (it'll be something like setLogLevel ?) but I'm not sure we can change it for an already running JVM ?

Anyways let me review this PR one more time, I think we can discuss the new API in a separate JIRA

abcPath <- testOutput[1]
expect_equal(abcPath, "a\\b\\c")
})