Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 #27328

Closed
wants to merge 10 commits into from

Conversation

zero323
Copy link
Member

@zero323 zero323 commented Jan 22, 2020

What changes were proposed in this pull request?

  • Update testthat to >= 2.0.0
  • Replace of testthat:::run_tests with testthat:::test_package_dir
  • Add trivial assertions for tests, without any expectations, to avoid skipping.
  • Update related docs.

Why are the changes needed?

testthat version has been frozen by SPARK-22817 / #20003, but 1.0.2 is pretty old, and we shouldn't keep things in this state forever.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Note

This change leads to

Error: Test failed: 'cleanClosure on R functions'

  • node stack overflow

in test_utils cleanClosure test, which requires further investigation.

CC @HyukjinKwon

@SparkQA
Copy link

SparkQA commented Jan 22, 2020

Test build #117261 has finished for PR 27328 at commit ac6d524.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [WIP][SPARK-30611][SPARKR][TESTS] Update testthat to >= 2.0.0 [WIP][SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 Jan 23, 2020
@HyukjinKwon
Copy link
Member

cc @felixcheung

@HyukjinKwon
Copy link
Member

If it works, we should install the latest testthat in Jenkins machine. cc @shaneknapp

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zero323, thanks for putting your efforts here. As long as the tests and CRAN check pass, it should be good to go.

@zero323
Copy link
Member Author

zero323 commented Jan 23, 2020

Hmmm... So it looks like it actually yields some additional errors on Windows.

test_sparkSQL.R:298: skip: create DataFrame from RDD
Reason: Hive is not build with SparkSQL, skipped
test_sparkSQL.R:862: failure: collect() support Unicode characters
rdf$name[1] not equal to markUtf8("<U+C548><U+B155><U+D558><U+C138><U+C694>").
1/1 mismatches
x[1]: "<U+C548><U+B155><U+D558><U+C138><U+C694>"
y[1]: "<U+C548><U+B155><U+D558><U+C138><U+C694>"
test_sparkSQL.R:863: failure: collect() support Unicode characters
rdf$name[2] not equal to markUtf8("<U+60A8><U+597D>").
1/1 mismatches
x[1]: "<U+60A8><U+597D>"
y[1]: "<U+60A8><U+597D>"
test_sparkSQL.R:864: failure: collect() support Unicode characters
rdf$name[3] not equal to markUtf8("<U+3053><U+3093><U+306B><U+3061><U+306F>").
1/1 mismatches
x[1]: "<U+3053><U+3093><U+306B><U+3061><U+306F>"
y[1]: "<U+3053><U+3093><U+306B><U+3061><U+306F>"
test_sparkSQL.R:865: failure: collect() support Unicode characters
rdf$name[4] not equal to markUtf8("Xin ch�o").
1/1 mismatches
x[1]: NA
y[1]: "Xin ch�o"
test_sparkSQL.R:868: failure: collect() support Unicode characters
collect(where(df1, df1$name == markUtf8("<U+60A8><U+597D>")))$name not equal to markUtf8("<U+60A8><U+597D>").
Modes: list, character
Length mismatch: comparison on first 0 components
test_sparkSQL.R:1325: skip: test HiveContext
Reason: Hive is not build with SparkSQL, skipped

Especially the last two are worrying. I'll investigate once I put my hands on some Windows environment.

@SparkQA
Copy link

SparkQA commented Jan 23, 2020

Test build #117325 has finished for PR 27328 at commit 8215126.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117328 has finished for PR 27328 at commit c2a5157.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117330 has finished for PR 27328 at commit 2e785d3.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323
Copy link
Member Author

zero323 commented Jan 24, 2020

I've checked some of the failures, and it is pretty clear that these tests must have been silenced somehow.

  • node stack overflow in cleanClosure is an obvious bug and it is trivial to reproduce outside tests - I've opened SPARK-30629 to track this one.

  • Seems like mismatches in collect() support Unicode characters are caused by Windows / R encoding quirks. As is right now lines don't even seem to be recognized as a valid JSON input (setting UTF-8 directly on lines helps with that, but doesn't resolve the whole problem).

    library(SparkR)
    SparkR::sparkR.session()
    Sys.info()
    #           sysname           release           version 
    #         "Windows"      "Server x64"     "build 17763" 
    #          nodename           machine             login 
    # "WIN-5BLT6Q610KH"          "x86-64"   "Administrator" 
    #              user    effective_user 
    #   "Administrator"   "Administrator" 
    
    Sys.getlocale()
    
    # [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
    
    lines <- c("{\"name\":\"안녕하세요\"}",
               "{\"name\":\"您好\", \"age\":30}",
               "{\"name\":\"こんにちは\", \"age\":19}",
               "{\"name\":\"Xin chào\"}")
    
    system(paste0("cat ", jsonPath))
    # {"name":"<U+C548><U+B155><U+D558><U+C138><U+C694>"}
    # {"name":"<U+60A8><U+597D>", "age":30}
    # {"name":"<U+3053><U+3093><U+306B><U+3061><U+306F>", "age":19}
    # {"name":"Xin chào"}
    # [1] 0
    
    
    jsonPath <- tempfile(pattern = "sparkr-test", fileext = ".tmp")
    writeLines(lines, jsonPath)
    
    df <- read.df(jsonPath, "json")
    
    
    printSchema(df)
    # root
    #  |-- _corrupt_record: string (nullable = true)
    #  |-- age: long (nullable = true)
    #  |-- name: string (nullable = true)
    
    head(df)
    #              _corrupt_record age                                     name
    # 1                       <NA>  NA <U+C548><U+B155><U+D558><U+C138><U+C694>
    # 2                       <NA>  30                         <U+60A8><U+597D>
    # 3                       <NA>  19 <U+3053><U+3093><U+306B><U+3061><U+306F>
    # 4 {"name":"Xin ch<U+FFFD>o"}  NA                                     <NA>

Let's see what else surfaces downstream.

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117332 has finished for PR 27328 at commit fd9ec4e.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

It seems like the lines input is not recognized as a valid JSON
when used on Windows CP1252.

It is not clear for me why I didn't fail before, as I can reproduce
this issue outside tests.
@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117355 has finished for PR 27328 at commit 29abbd8.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117358 has finished for PR 27328 at commit 3bb15aa.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117359 has finished for PR 27328 at commit 5468b4b.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117367 has finished for PR 27328 at commit e02d85e.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor

shaneknapp commented Jan 24, 2020

NOOOOOOOOOOOO!!!!!!!!11111one

i really don't want to touch the extremely fragile R "ecosystem" unless i have to. but, this looks like i probably should... :(

i'll experiment w/upgrading testthat to 2.0.0 on my staging server and cross my fingers that it doesn't open dependency hell.

we also might need to backport this change for the 2.4 branch if it causes those builds to fail.

created a sub-task in jira for this: https://issues.apache.org/jira/browse/SPARK-30637

@shaneknapp
Copy link
Contributor

shaneknapp commented Jan 24, 2020

welp, uninstalling 1.0.2 and reinstalling 2.0.0 on my staging worker went smoothly... which makes me nervous lol. :)

* installing *source* package ‘testthat’ ...
** package ‘testthat’ successfully unpacked and MD5 sums checked
** libs
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I../inst/include -DCOMPILING_TESTTHAT     -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c init.c -o init.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I../inst/include -DCOMPILING_TESTTHAT     -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c reassign.c -o reassign.o
g++  -I/usr/share/R/include -DNDEBUG -I../inst/include -DCOMPILING_TESTTHAT     -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c test-catch.cpp -o test-catch.o
g++  -I/usr/share/R/include -DNDEBUG -I../inst/include -DCOMPILING_TESTTHAT     -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c test-example.cpp -o test-example.o
g++  -I/usr/share/R/include -DNDEBUG -I../inst/include -DCOMPILING_TESTTHAT     -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c test-runner.cpp -o test-runner.o
g++ -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o testthat.so init.o reassign.o test-catch.o test-example.o test-runner.o -L/usr/lib/R/lib -lR
installing to /usr/local/lib/R/site-library/testthat/libs
** R
** inst
** tests
** preparing package for lazy loading
** help
*** installing help indices
*** copying figures
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (testthat)

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117371 has finished for PR 27328 at commit 0265727.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117374 has finished for PR 27328 at commit abcaf3d.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117376 has finished for PR 27328 at commit 684831b.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323
Copy link
Member Author

zero323 commented Jan 24, 2020

@shaneknapp

NOOOOOOOOOOOO!!!!!!!!11111one

i really don't want to touch the extremely fragile R "ecosystem" unless i have to. but, this looks like i probably should... :(

If you don't mind me asking ‒ have you thought about moving R environments to ananconda. If I recall correctly some discussions on dev, you we're pretty enthusiastic about using it with Python.

@zero323 zero323 closed this Jan 25, 2020
@zero323 zero323 deleted the SPARK-30611 branch January 25, 2020 00:00
@zero323 zero323 restored the SPARK-30611 branch January 25, 2020 00:01
@HyukjinKwon
Copy link
Member

@shaneknapp, I know what's like (#27328 (comment)) .. lol .. this has been the pain in the ass in the dev env because nowadays many R libraries have a higher version of testthat as a dependency

HyukjinKwon pushed a commit that referenced this pull request Jan 29, 2020
### What changes were proposed in this pull request?

- Update `testthat` to >= 2.0.0
- Replace of `testthat:::run_tests` with `testthat:::test_package_dir`
- Add trivial assertions for tests, without any expectations, to avoid skipping.
- Update related docs.

### Why are the changes needed?

`testthat` version has been frozen by [SPARK-22817](https://issues.apache.org/jira/browse/SPARK-22817) / #20003, but 1.0.2 is pretty old, and we shouldn't keep things in this state forever.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

- Existing CI pipeline:
     - Windows build on AppVeyor, R 3.6.2, testthtat 2.3.1
     - Linux build on Jenkins, R 3.1.x, testthat 1.0.2

- Additional builds with thesthat 2.3.1  using [sparkr-build-sandbox](https://github.com/zero323/sparkr-build-sandbox) on c7ed64a

   R 3.4.4  (image digest ec9032f8cf98)
   ```
   docker pull zero323/sparkr-build-sandbox:3.4.4
   docker run zero323/sparkr-build-sandbox:3.4.4 zero323 --branch SPARK-23435 --commit c7ed64a --public-key https://keybase.io/zero323/pgp_keys.asc
    ```
    3.5.3 (image digest 0b1759ee4d1d)

    ```
    docker pull zero323/sparkr-build-sandbox:3.5.3
    docker run zero323/sparkr-build-sandbox:3.5.3 zero323 --branch SPARK-23435 --commit
    c7ed64a --public-key https://keybase.io/zero323/pgp_keys.asc
    ```

   and 3.6.2 (image digest 6594c8ceb72f)
    ```
   docker pull zero323/sparkr-build-sandbox:3.6.2
   docker run zero323/sparkr-build-sandbox:3.6.2 zero323 --branch SPARK-23435 --commit c7ed64a --public-key https://keybase.io/zero323/pgp_keys.asc
   ````

   Corresponding [asciicast](https://asciinema.org/) are available as 10.5281/zenodo.3629431

     [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3629431.svg)](https://doi.org/10.5281/zenodo.3629431)

   (a bit to large to burden asciinema.org, but can run locally via `asciinema play`).

----------------------------

Continued from #27328

Closes #27359 from zero323/SPARK-23435.

Authored-by: zero323 <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
HyukjinKwon pushed a commit to HyukjinKwon/spark that referenced this pull request Jan 29, 2020
- Update `testthat` to >= 2.0.0
- Replace of `testthat:::run_tests` with `testthat:::test_package_dir`
- Add trivial assertions for tests, without any expectations, to avoid skipping.
- Update related docs.

`testthat` version has been frozen by [SPARK-22817](https://issues.apache.org/jira/browse/SPARK-22817) / apache#20003, but 1.0.2 is pretty old, and we shouldn't keep things in this state forever.

No.

- Existing CI pipeline:
     - Windows build on AppVeyor, R 3.6.2, testthtat 2.3.1
     - Linux build on Jenkins, R 3.1.x, testthat 1.0.2

- Additional builds with thesthat 2.3.1  using [sparkr-build-sandbox](https://github.com/zero323/sparkr-build-sandbox) on c7ed64a

   R 3.4.4  (image digest ec9032f8cf98)
   ```
   docker pull zero323/sparkr-build-sandbox:3.4.4
   docker run zero323/sparkr-build-sandbox:3.4.4 zero323 --branch SPARK-23435 --commit c7ed64a --public-key https://keybase.io/zero323/pgp_keys.asc
    ```
    3.5.3 (image digest 0b1759ee4d1d)

    ```
    docker pull zero323/sparkr-build-sandbox:3.5.3
    docker run zero323/sparkr-build-sandbox:3.5.3 zero323 --branch SPARK-23435 --commit
    c7ed64a --public-key https://keybase.io/zero323/pgp_keys.asc
    ```

   and 3.6.2 (image digest 6594c8ceb72f)
    ```
   docker pull zero323/sparkr-build-sandbox:3.6.2
   docker run zero323/sparkr-build-sandbox:3.6.2 zero323 --branch SPARK-23435 --commit c7ed64a --public-key https://keybase.io/zero323/pgp_keys.asc
   ````

   Corresponding [asciicast](https://asciinema.org/) are available as 10.5281/zenodo.3629431

     [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3629431.svg)](https://doi.org/10.5281/zenodo.3629431)

   (a bit to large to burden asciinema.org, but can run locally via `asciinema play`).

----------------------------

Continued from apache#27328

Closes apache#27359 from zero323/SPARK-23435.

Authored-by: zero323 <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Jan 29, 2020
### What changes were proposed in this pull request?

This PR backports #27359:

- Update `testthat` to >= 2.0.0
- Replace of `testthat:::run_tests` with `testthat:::test_package_dir`
- Add trivial assertions for tests, without any expectations, to avoid skipping.
- Update related docs.

### Why are the changes needed?

`testthat` version has been frozen by [SPARK-22817](https://issues.apache.org/jira/browse/SPARK-22817) / #20003, but 1.0.2 is pretty old, and we shouldn't keep things in this state forever.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

- Existing CI pipeline:
     - Windows build on AppVeyor, R 3.6.2, testthtat 2.3.1
     - Linux build on Jenkins, R 3.1.x, testthat 1.0.2

- Additional builds with thesthat 2.3.1  using [sparkr-build-sandbox](https://github.com/zero323/sparkr-build-sandbox) on c7ed64a

   R 3.4.4  (image digest ec9032f8cf98)
   ```
   docker pull zero323/sparkr-build-sandbox:3.4.4
   docker run zero323/sparkr-build-sandbox:3.4.4 zero323 --branch SPARK-23435 --commit c7ed64a --public-key https://keybase.io/zero323/pgp_keys.asc
    ```
    3.5.3 (image digest 0b1759ee4d1d)

    ```
    docker pull zero323/sparkr-build-sandbox:3.5.3
    docker run zero323/sparkr-build-sandbox:3.5.3 zero323 --branch SPARK-23435 --commit
    c7ed64a --public-key https://keybase.io/zero323/pgp_keys.asc
    ```

   and 3.6.2 (image digest 6594c8ceb72f)
    ```
   docker pull zero323/sparkr-build-sandbox:3.6.2
   docker run zero323/sparkr-build-sandbox:3.6.2 zero323 --branch SPARK-23435 --commit c7ed64a --public-key https://keybase.io/zero323/pgp_keys.asc
   ````

   Corresponding [asciicast](https://asciinema.org/) are available as 10.5281/zenodo.3629431

     [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3629431.svg)](https://doi.org/10.5281/zenodo.3629431)

   (a bit to large to burden asciinema.org, but can run locally via `asciinema play`).

----------------------------

Continued from #27328

Closes #27379 from HyukjinKwon/testthat-2.0.

Authored-by: zero323 <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
@shaneknapp
Copy link
Contributor

@shaneknapp

NOOOOOOOOOOOO!!!!!!!!11111one

i really don't want to touch the extremely fragile R "ecosystem" unless i have to. but, this looks like i probably should... :(

If you don't mind me asking ‒ have you thought about moving R environments to ananconda. If I recall correctly some discussions on dev, you we're pretty enthusiastic about using it with Python.

i just tried moving to anaconda and it doesn't seem to support versions high enough of many of the packages that we require (for example, testthat is 1.0.7 or something IIRC).

@zero323
Copy link
Member Author

zero323 commented Feb 28, 2020

@shaneknapp

i just tried moving to anaconda and it doesn't seem to support versions high enough of many of the packages that we require (for example, testthat is 1.0.7 or something IIRC).

2.3.1 is available from forge, and if nothing changed, the remaining ones are as well. I should have working environment files stashed somewhere, if that helps, but there probably in some place I won't be able to access for a week or two.

@shaneknapp
Copy link
Contributor

shaneknapp commented Mar 3, 2020 via email

@shaneknapp
Copy link
Contributor

actually, it seems that i've gotten the R suite of drama packages installed to the latest/greatest 3.6.3 via apt, and all the required libs (including testthat @ 2.3.2) installed via install.packages, and IT WORKS.

these are on ubuntu 18.04LTS dev workers, but i'm about to release them to the general ubuntu worker pool and reimage the old ubuntu 16 workers w/18. YAY!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants