Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8603][SPARKR] Use shell() instead of system2() for SparkR on Windows #13165

Closed
wants to merge 17 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented May 18, 2016

What changes were proposed in this pull request?

This PR corrects SparkR to use shell() instead of system2() on Windows.

Using system2(...) on Windows does not process windows file separator \. shell(tralsate = TRUE, ...) can treat this problem. So, this was changed to be chosen according to OS.

Existing tests were failed on Windows due to this problem. For example, those were failed.

8. Failure: sparkJars tag in SparkContext (@test_includeJAR.R#34)
9. Failure: sparkJars tag in SparkContext (@test_includeJAR.R#36)

The cases above were due to using of system2.

In addition, this PR also fixes some tests failed on Windows.

5. Failure: sparkJars sparkPackages as comma-separated strings (@test_context.R#128) 
6. Failure: sparkJars sparkPackages as comma-separated strings (@test_context.R#131) 
7. Failure: sparkJars sparkPackages as comma-separated strings (@test_context.R#134) 

The cases above were due to a weird behaviour of normalizePath(). On Linux, if the path does not exist, it just prints out the input but it prints out including the current path on Windows.

# On Linus
path <- normalizePath("aa")
print(path)
[1] "aa"

# On Windows
path <- normalizePath("aa")
print(path)
[1] "C:\\Users\\aa"

How was this patch tested?

Jenkins tests and manually tested in a Window machine as below:

Here is the stdout of testing.

Closes #7025

@HyukjinKwon
Copy link
Member Author

Please let me cc @sun-rui and @JoshRosen

@sun-rui
Copy link
Contributor

sun-rui commented May 18, 2016

Could you add a test case for Windows specific test? @shivaram

@SparkQA
Copy link

SparkQA commented May 18, 2016

Test build #58758 has finished for PR 13165 at commit 454b7f5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 18, 2016

Test build #58760 has finished for PR 13165 at commit 3057844.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

# of the operating system. So, this should be manaully changed.
determinefileSeperator <- function() {
if (.Platform$OS.type == "windows") {
fileSeperator <- "\\"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: separator not seperator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much.

@SparkQA
Copy link

SparkQA commented May 18, 2016

Test build #58763 has finished for PR 13165 at commit 935d796.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

Does this apply to other cases:

script <- file.path(dirs[[1]], "SparkR", "worker", "worker.R")

.libPaths(c(file.path(home, "R", "lib"), .libPaths()))

path <- file.path(Sys.getenv("SPARK_HOME"), "examples/src/main/resources/people.json")

(last one is an example)

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented May 19, 2016

@sun-rui @felixcheung Let me try to build and run all tests for R first in Windows and then will try to correct and add each test one by one.

This will take a bit of time and I might have to ask a lot of questions (I am not familiar with R and Windows as I said in the stale PR above, #7025) but anyway I will try.

Meanwhile, If anybody wants to take over this, please do so.

@sun-rui
Copy link
Contributor

sun-rui commented May 19, 2016

@felixcheung, this issue seems to relate to system2() only. However, let's wait for HyukjinKwon's test result.

@HyukjinKwon, great, go ahead please.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented May 20, 2016

@sun-rui @felixcheung Right. It seems finally I made it. I made gists and uploaded a PDF file of Spark UI to my google drive.

Let me tell you the test results first. The test was proceeded against master branch.

Here is the stdout output for the tests on Windwos 7 32bit, output.msg.
Here is the stderr output for the tests on Windwos 7 32bit, output.err.
Here is the PDF for Spark UI PDF (Sorry, it seems some history are truncated. I will make a proper one if I have to run this again)

Here are the steps I took:

  1. I run tests after building Spark on Windows according to ./R/WINDOWS.md

  2. It seems $HADOOP_HOME should be set. So, I added the variable in my system.

  3. It seems winutils.exe is required which is included in Hadoop official binary although it reads files in the local file system during the tests.

  4. And run the tests by the command below (after setting log levels to ERROR in log4j.properties because it was so verbose):

    cd bin
    spark-submit2.cmd --conf spark.hadoop.fs.defualt.name="file:///" ..\R\pkg\tests\run-all.R > output.msg 2> output.err

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented May 20, 2016

This raises some questions to me.

  1. It seems several tests were failed. Could you please inform me your thoughts?
  2. Now, I think I can add some tests but could you please tell me where I should write the related tests and maybe rough ideas of the tests I should add?
  3. Can I separately update ./R/WINDOWS.md in another PR to describe how I tested this on Winodws as above?

@sun-rui
Copy link
Contributor

sun-rui commented May 20, 2016

  1. A rough scan of the test failures shows most of them are probably related to path handling. You can replay the failed test case in R on Windows. For debug facilities in R, refer to http://www.inside-r.org/r-doc/base/traceback, https://stat.ethz.ch/R-manual/R-devel/library/base/html/debug.html, http://www.inside-r.org/r-doc/base/browser
  2. You can add a new file named test_Windows.R under R/pkg/inst/tests/testthat
  3. Sure.

@sun-rui
Copy link
Contributor

sun-rui commented May 20, 2016

I checked the source code of system2. Since a batch file with .cmd extension is to be launched on Windows, I think it is more formal to use shell() instead of system2(). Refer to http://www.astrostatistics.psu.edu/datasets/R/html/base/html/shell.html. The proposed fix is like follows:

if (.Platform$OS.type == "windows") {
  shell(sparkSubmitBin, translate = TRUE, ...)
} else {
  system2(sparkSubmitBin, ...)
}

@HyukjinKwon
Copy link
Member Author

@sun-rui Thank you so much. I will try to add a test.

@shivaram
Copy link
Contributor

Related to my comment on #13217 -- I will test this out on windows using the new instructions.

@HyukjinKwon
Copy link
Member Author

@shivaram Thank you so much. Let me try to add a test meanwhile.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented May 25, 2016

@shivaram and @sun-rui I addressed the comment, added a test for this PR and fixed some tests not working on Windows due to path separator.

This PR also fixes the test failures below.

  • Fixed case 1

    5. Failure: sparkJars sparkPackages as comma-separated strings (@test_context.R#128) 
    6. Failure: sparkJars sparkPackages as comma-separated strings (@test_context.R#131) 
    7. Failure: sparkJars sparkPackages as comma-separated strings (@test_context.R#134) 
    

    The cases above were due to a weird behaviour of normalizePath(). On Linux, if the path does not exist, it just prints out the input but it prints out including the current path on Windows.

    # On Linus
    path <- normalizePath("aa")
    print(path)
    [1] "aa"
    
    # On Windows
    path <- normalizePath("aa")
    print(path)
    [1] "C:\\Users\\aa"
  • Fixed case 2

    8. Failure: sparkJars tag in SparkContext (@test_includeJAR.R#34)
    9. Failure: sparkJars tag in SparkContext (@test_includeJAR.R#36)
    

    The cases above were due to using of system2.

Here is full stdout of my testing, output.msg on Windows.

@SparkQA
Copy link

SparkQA commented May 25, 2016

Test build #59275 has finished for PR 13165 at commit 64141d2.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 25, 2016

Test build #59276 has finished for PR 13165 at commit 4a53360.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -60,6 +60,15 @@ generateSparkSubmitArgs <- function(args, sparkHome, jars, sparkSubmitOpts, pack
combinedArgs
}

determineLauncher <- function(sparkSubmitBin, combinedArgs, capture = FALSE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be a common utility function and be re-used in test_includeJAR.R?

Copy link
Member Author

@HyukjinKwon HyukjinKwon May 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will.

BTW, could you inform me if I can disable style checking for some blocks in R like Scala? It seems it fails due to the usages of shell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use nolint - See context.R for an example. BTW what is the lint error here ?

Copy link
Member Author

@HyukjinKwon HyukjinKwon May 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivaram For some reasons, it seems there is no mothod shell() on Linux (I tested this on my Mac) but on Windows. I am using this twice within this PR. Here is a thread for this, http://r.789695.n4.nabble.com/Shell-Function-not-on-Linux-td4672569.html.

I could not find the official documentation though. Let me address the comments first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the common function could be like:

  launchScript(script) {
    if (windowsOS) use shell()
    else (system2()
  }

This makes it more general, not binding to launching SparkSubmit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sun-rui Thank you so much. I was trying to come up with a good name.

@HyukjinKwon HyukjinKwon changed the title [SPARK-8603][SPARKR] Incorrect file separator passed to Java and Scripts from R in windows [SPARK-8603][SPARKR] Incorrect paths being used for Spark R on Windows May 26, 2016
@HyukjinKwon HyukjinKwon changed the title [SPARK-8603][SPARKR] Incorrect paths being used for Spark R on Windows [SPARK-8603][SPARKR] Use shell() instead of system2() for SparkR on Windows May 26, 2016
@SparkQA
Copy link

SparkQA commented May 26, 2016

Test build #59339 has finished for PR 13165 at commit 0482ebb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

res <- system2(command = submitPath,
args = c(jarPath, scriptPath),
stdout = TRUE)
if (.Platform$OS.type == "windows") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can call determineSparkSubmitBin() here

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented May 26, 2016

@sun-rui Thanks, I just updated it. Here is the stdout for this time

@SparkQA
Copy link

SparkQA commented May 26, 2016

Test build #59363 has finished for PR 13165 at commit a1ccd5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sun-rui
Copy link
Contributor

sun-rui commented May 26, 2016

LGTM

@sun-rui
Copy link
Contributor

sun-rui commented May 26, 2016

could you rebase it to latest master. It is hopeful other test failures will gone

@HyukjinKwon
Copy link
Member Author

Sure, Thanks!

@SparkQA
Copy link

SparkQA commented May 26, 2016

Test build #59367 has finished for PR 13165 at commit 7eadefd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 26, 2016

Test build #59383 has finished for PR 13165 at commit 521f07b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

@HyukjinKwon Thanks for the update. I verified this today using a Windows VM and followed the instructions you have written up as well. All the tests passed and the stdout I got is at https://gist.github.com/shivaram/bd3c8bf7466d6cd515f4877f2a9611b3

One minor thing - there is a typo in the last line of https://github.com/apache/spark/blob/branch-2.0/R/WINDOWS.md
spark.hadoop.fs.defualt.name should be spark.hadoop.fs.default.name (typo in the spelling of default).

Could you make that change as a part of this PR ?

@HyukjinKwon
Copy link
Member Author

@shivaram Thank you so much! I will keep this machine as it is anyway. Please cc me if there are some PRs dealing with Windows in the future. I can run tests and leave some comments.

@SparkQA
Copy link

SparkQA commented May 27, 2016

Test build #59439 has finished for PR 13165 at commit 752c40a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

Merging this to master and branch-2.0

asfgit pushed a commit that referenced this pull request May 27, 2016
…indows

## What changes were proposed in this pull request?

This PR corrects SparkR to use `shell()` instead of `system2()` on Windows.

Using `system2(...)` on Windows does not process windows file separator `\`. `shell(tralsate = TRUE, ...)` can treat this problem. So, this was changed to be chosen according to OS.

Existing tests were failed on Windows due to this problem. For example, those were failed.

  ```
8. Failure: sparkJars tag in SparkContext (test_includeJAR.R#34)
9. Failure: sparkJars tag in SparkContext (test_includeJAR.R#36)
```

The cases above were due to using of `system2`.

In addition, this PR also fixes some tests failed on Windows.

  ```
5. Failure: sparkJars sparkPackages as comma-separated strings (test_context.R#128)
6. Failure: sparkJars sparkPackages as comma-separated strings (test_context.R#131)
7. Failure: sparkJars sparkPackages as comma-separated strings (test_context.R#134)
```

  The cases above were due to a weird behaviour of `normalizePath()`. On Linux, if the path does not exist, it just prints out the input but it prints out including the current path on Windows.

  ```r
# On Linus
path <- normalizePath("aa")
print(path)
[1] "aa"

# On Windows
path <- normalizePath("aa")
print(path)
[1] "C:\\Users\\aa"
```

## How was this patch tested?

Jenkins tests and manually tested in a Window machine as below:

Here is the [stdout](https://gist.github.com/HyukjinKwon/4bf35184f3a30f3bce987a58ec2bbbab) of testing.

Closes #7025

Author: hyukjinkwon <[email protected]>
Author: Hyukjin Kwon <[email protected]>
Author: Prakash PC <[email protected]>

Closes #13165 from HyukjinKwon/pr/7025.

(cherry picked from commit 1c40373)
Signed-off-by: Shivaram Venkataraman <[email protected]>
@asfgit asfgit closed this in 1c40373 May 27, 2016
ghost pushed a commit to dbtsai/spark that referenced this pull request Sep 8, 2016
…ting on Windows (currently SparkR only)

## What changes were proposed in this pull request?

This PR adds the build automation on Windows with [AppVeyor](https://www.appveyor.com/) CI tool.

Currently, this only runs the tests for SparkR as we have been having some issues with testing Windows-specific PRs (e.g. apache#14743 and apache#13165) and hard time to verify this.

One concern is, this build is dependent on [steveloughran/winutils](https://github.com/steveloughran/winutils) for pre-built Hadoop bin package (who is a Hadoop PMC member).

## How was this patch tested?

Manually, https://ci.appveyor.com/project/HyukjinKwon/spark/build/88-SPARK-17200-build-profile
This takes roughly 40 mins.

Some tests are already being failed and this was found in apache#14743 (comment).

Author: hyukjinkwon <[email protected]>

Closes apache#14859 from HyukjinKwon/SPARK-17200-build.
@HyukjinKwon HyukjinKwon deleted the pr/7025 branch January 2, 2018 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants