[SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 #27359

zero323 · 2020-01-25T00:16:01Z

What changes were proposed in this pull request?

Update testthat to >= 2.0.0
Replace of testthat:::run_tests with testthat:::test_package_dir
Add trivial assertions for tests, without any expectations, to avoid skipping.
Update related docs.

Why are the changes needed?

testthat version has been frozen by SPARK-22817 / #20003, but 1.0.2 is pretty old, and we shouldn't keep things in this state forever.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing CI pipeline:
- Windows build on AppVeyor, R 3.6.2, testthtat 2.3.1
- Linux build on Jenkins, R 3.1.x, testthat 1.0.2

Additional builds with thesthat 2.3.1 using sparkr-build-sandbox on c7ed64a

R 3.4.4 (image digest ec9032f8cf98)

docker pull zero323/sparkr-build-sandbox:3.4.4
docker run zero323/sparkr-build-sandbox:3.4.4 zero323 --branch SPARK-23435 --commit c7ed64af9e697b3619779857dd820832176b3be3 --public-key https://keybase.io/zero323/pgp_keys.asc

3.5.3 (image digest 0b1759ee4d1d)

docker pull zero323/sparkr-build-sandbox:3.5.3
docker run zero323/sparkr-build-sandbox:3.5.3 zero323 --branch SPARK-23435 --commit 
c7ed64af9e697b3619779857dd820832176b3be3 --public-key https://keybase.io/zero323/pgp_keys.asc

and 3.6.2 (image digest 6594c8ceb72f)

docker pull zero323/sparkr-build-sandbox:3.6.2
docker run zero323/sparkr-build-sandbox:3.6.2 zero323 --branch SPARK-23435 --commit c7ed64af9e697b3619779857dd820832176b3be3 --public-key https://keybase.io/zero323/pgp_keys.asc

Corresponding asciicast are available as 10.5281/zenodo.3629431

(a bit to large to burden asciinema.org, but can run locally via asciinema play).

Continued from #27328

SparkQA · 2020-01-25T00:50:00Z

Test build #117385 has finished for PR 27359 at commit 28115ce.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-01-25T01:26:42Z

Test build #117389 has finished for PR 27359 at commit 2e265ab.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-01-25T02:21:59Z

Test build #117392 has finished for PR 27359 at commit 5a565fa.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-01-25T10:01:24Z

Test build #117395 has finished for PR 27359 at commit c4cd053.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-01-25T13:33:16Z

Test build #117397 has finished for PR 27359 at commit cb5bf59.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-01-25T14:22:43Z

Test build #117399 has finished for PR 27359 at commit 05c3f8d.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-01-25T19:48:56Z

Test build #117402 has finished for PR 27359 at commit 3247cd4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zero323 · 2020-01-25T20:01:39Z

Assuming that Windows test passes (it did previously), it is ready for review.

Unfortunately I couldn't figure out how to correctly install testthat 2.0 on Jenkins from inside test runner ‒ some configurations that worked locally (on official R-base docker image with 3.1.3, still failed here), and I don't want to throttle Jenkins anymore with my lousy attempts (sorry for that).

So I decided to add fallback case, so it runs for now on both testthat 1.x and 2.x. It means that appveyor runs with 2.x and Jenkins with 1.x ‒ I guess it might be considered a good thing for now (@shaneknapp?). The only caveat is that we'll have to pay more attention to AppVeyor builds, until Jenkins is upgraded, as it has become clear, that current setup can silence some errors.

CC @HyukjinKwon @felixcheung

R/pkg/tests/fulltests/test_sparkSQL.R

R/pkg/tests/fulltests/test_utils.R

HyukjinKwon · 2020-01-26T01:23:09Z

I am all good as long as we can use the latest testthat. This is a really annoying and rather critical problem because many R libraries nowsdays has testthat dependency.

zero323 · 2020-01-26T01:35:44Z

@HyukjinKwon

I am all good as long as we can use the latest testthat. This is a really annoying and rather critical problem because many R libraries nowsdays has testthat dependency.

Following @dongjoon-hyun advice I moved commits fixing test issues discovered here out this PR. Could you take a look at #27362 and see if it makes sense to you?

As of the other one, I'll try to debug this over next few days and see if I can find a fix, if not questions remains if it should be a blocker for testthat upgrade. It feel like a bit like sweeping things under the rug.

HyukjinKwon · 2020-01-26T01:48:26Z

Unfortunately I couldn't figure out how to correctly install testthat 2.0 on Jenkins from inside test runner ‒ some configurations that worked locally (on official R-base docker image with 3.1.3, still failed here), and I don't want to throttle Jenkins anymore with my lousy attempts (sorry for that).

@zero323, so basically you couldn't figure out CRAN check to work with R 3.1.3? If that's the case, does it work with the latest of R 3.4.x (at least in your local or any Linux based OS)?

Spark will deprecate R lower versions than 3.4 in Spark 3.0 and we will drop it right away after Spark 3.0 release. So, R version in Jenkins will increase to the latest of 3.4.x ideally.

R/pkg/tests/run-all.R

zero323 · 2020-01-26T02:07:34Z

@zero323, so basically you couldn't figure out CRAN check to work with R 3.1.3? If that's the case, does it work with the latest of R 3.4.x (at least in your local or any Linux based OS)?

As long as fix for SPARK-30645 is applied and test for cleanClosure on recursive (SPARK-30629) is disabled everything looks green, i.e.

Passes all tests locally on R 3.6.2 / testthat 2.3.1
It passes all tests on AppVeyor (I don't recall R version there, but testthat is 2.3.1).
On Jenkins we fallback to 1.x so it is not really meaningful (I don't see how SPARK-30629 flies under the radar there), but all tests passed as well.

About I couldn't figure ‒ I tried to install.packages inside R/pkg/tests/run-all.R in different combinations that seem to work locally, but it failed miserably on Jenkins (in general with R 3.1 used there we can target only testthat 2.0 or so, as later versions require 0.4, which in turn depends on R >= 3.2).

SparkQA · 2020-01-26T02:12:00Z

Test build #117406 has finished for PR 27359 at commit 9cac55f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…nal file ### What changes were proposed in this pull request? Reference data for "collect() support Unicode characters" has been moved to an external file, to make test OS and locale independent. ### Why are the changes needed? As-is, embedded data is not properly encoded on Windows: ``` library(SparkR) SparkR::sparkR.session() Sys.info() # sysname release version # "Windows" "Server x64" "build 17763" # nodename machine login # "WIN-5BLT6Q610KH" "x86-64" "Administrator" # user effective_user # "Administrator" "Administrator" Sys.getlocale() # [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252" lines <- c("{\"name\":\"안녕하세요\"}", "{\"name\":\"您好\", \"age\":30}", "{\"name\":\"こんにちは\", \"age\":19}", "{\"name\":\"Xin chào\"}") system(paste0("cat ", jsonPath)) # {"name":"<U+C548><U+B155><U+D558><U+C138><U+C694>"} # {"name":"<U+60A8><U+597D>", "age":30} # {"name":"<U+3053><U+3093><U+306B><U+3061><U+306F>", "age":19} # {"name":"Xin chào"} # [1] 0 jsonPath <- tempfile(pattern = "sparkr-test", fileext = ".tmp") writeLines(lines, jsonPath) df <- read.df(jsonPath, "json") printSchema(df) # root # |-- _corrupt_record: string (nullable = true) # |-- age: long (nullable = true) # |-- name: string (nullable = true) head(df) # _corrupt_record age name # 1 <NA> NA <U+C548><U+B155><U+D558><U+C138><U+C694> # 2 <NA> 30 <U+60A8><U+597D> # 3 <NA> 19 <U+3053><U+3093><U+306B><U+3061><U+306F> # 4 {"name":"Xin ch<U+FFFD>o"} NA <NA> ``` This can be reproduced outside tests (Windows Server 2019, English locale), and causes failures, when `testthat` is updated to 2.x (#27359). Somehow problem is not picked-up when test is executed on `testthat` 1.0.2. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Running modified test, manual testing. ### Note Alternative seems to be to used bytes, but it hasn't been properly tested. ``` test_that("collect() support Unicode characters", { lines <- markUtf8(c( '{"name": "안녕하세요"}', '{"name": "您好", "age": 30}', '{"name": "こんにちは", "age": 19}', '{"name": "Xin ch\xc3\xa0o"}' )) jsonPath <- tempfile(pattern = "sparkr-test", fileext = ".tmp") writeLines(lines, jsonPath, useBytes = TRUE) expected <- regmatches(lines, regexec('(?<="name": ").*?(?=")', lines, perl = TRUE)) df <- read.df(jsonPath, "json") rdf <- collect(df) expect_true(is.data.frame(rdf)) rdf$name <- markUtf8(rdf$name) expect_equal(rdf$name[1], expected[[1]]) expect_equal(rdf$name[2], expected[[2]]) expect_equal(rdf$name[3], expected[[3]]) expect_equal(rdf$name[4], expected[[4]]) df1 <- createDataFrame(rdf) expect_equal( collect( where(df1, df1$name == expected[[2]]) )$name, expected[[2]] ) }) ``` Closes #27362 from zero323/SPARK-30645. Authored-by: zero323 <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>

…nal file ### What changes were proposed in this pull request? Reference data for "collect() support Unicode characters" has been moved to an external file, to make test OS and locale independent. ### Why are the changes needed? As-is, embedded data is not properly encoded on Windows: ``` library(SparkR) SparkR::sparkR.session() Sys.info() # sysname release version # "Windows" "Server x64" "build 17763" # nodename machine login # "WIN-5BLT6Q610KH" "x86-64" "Administrator" # user effective_user # "Administrator" "Administrator" Sys.getlocale() # [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252" lines <- c("{\"name\":\"안녕하세요\"}", "{\"name\":\"您好\", \"age\":30}", "{\"name\":\"こんにちは\", \"age\":19}", "{\"name\":\"Xin chào\"}") system(paste0("cat ", jsonPath)) # {"name":"<U+C548><U+B155><U+D558><U+C138><U+C694>"} # {"name":"<U+60A8><U+597D>", "age":30} # {"name":"<U+3053><U+3093><U+306B><U+3061><U+306F>", "age":19} # {"name":"Xin chào"} # [1] 0 jsonPath <- tempfile(pattern = "sparkr-test", fileext = ".tmp") writeLines(lines, jsonPath) df <- read.df(jsonPath, "json") printSchema(df) # root # |-- _corrupt_record: string (nullable = true) # |-- age: long (nullable = true) # |-- name: string (nullable = true) head(df) # _corrupt_record age name # 1 <NA> NA <U+C548><U+B155><U+D558><U+C138><U+C694> # 2 <NA> 30 <U+60A8><U+597D> # 3 <NA> 19 <U+3053><U+3093><U+306B><U+3061><U+306F> # 4 {"name":"Xin ch<U+FFFD>o"} NA <NA> ``` This can be reproduced outside tests (Windows Server 2019, English locale), and causes failures, when `testthat` is updated to 2.x (#27359). Somehow problem is not picked-up when test is executed on `testthat` 1.0.2. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Running modified test, manual testing. ### Note Alternative seems to be to used bytes, but it hasn't been properly tested. ``` test_that("collect() support Unicode characters", { lines <- markUtf8(c( '{"name": "안녕하세요"}', '{"name": "您好", "age": 30}', '{"name": "こんにちは", "age": 19}', '{"name": "Xin ch\xc3\xa0o"}' )) jsonPath <- tempfile(pattern = "sparkr-test", fileext = ".tmp") writeLines(lines, jsonPath, useBytes = TRUE) expected <- regmatches(lines, regexec('(?<="name": ").*?(?=")', lines, perl = TRUE)) df <- read.df(jsonPath, "json") rdf <- collect(df) expect_true(is.data.frame(rdf)) rdf$name <- markUtf8(rdf$name) expect_equal(rdf$name[1], expected[[1]]) expect_equal(rdf$name[2], expected[[2]]) expect_equal(rdf$name[3], expected[[3]]) expect_equal(rdf$name[4], expected[[4]]) df1 <- createDataFrame(rdf) expect_equal( collect( where(df1, df1$name == expected[[2]]) )$name, expected[[2]] ) }) ``` Closes #27362 from zero323/SPARK-30645. Authored-by: zero323 <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit 40b1f4d) Signed-off-by: HyukjinKwon <[email protected]>

HyukjinKwon · 2020-01-26T04:06:04Z

@zero323, do you mind if I ask to check R 3.4.x latest and testthat latest combination? After we drop R lower than 3.4 (which will be after Spark 3.0 release since we already deprecated), we will probably switch R version to 3.4.x latest in Jenkins. Then, I think this PR is almost good to go after https://issues.apache.org/jira/browse/SPARK-30629. I, @felixcheung, @shivaram, @dongjoon-hyun or another committer might have to take another look though.

zero323 · 2020-01-26T04:11:55Z

@zero323, do you mind if I ask to check R 3.4.x latest and testthat latest combination? After we drop R lower than 3.4 (which will be after Spark 3.0 release since we already deprecated), we will probably switch R version to 3.4.x latest in Jenkins. Then, I think this PR is almost good to go after https://issues.apache.org/jira/browse/SPARK-30629. I, @felixcheung or @shivaram might have to take another look though.

Not at all, unless you ask for Windows build :)

Seriously though, I had enough problems in category "works for me" with this thing, to be careful. Once #27363 I'll rebase, and make another local build in docker to be sure.

…rsive calls ### What changes were proposed in this pull request? Disabling test for cleaning closure of recursive function. ### Why are the changes needed? As of 9514b82 this test is no longer valid, and recursive calls, even simple ones: ```lead f <- function(x) { if(x > 0) { f(x - 1) } else { x } } ``` lead to ``` Error: node stack overflow ``` This is issue is silenced when tested with `testthat` 1.x (reason unknown), but cause failures when using `testthat` 2.x (issue can be reproduced outside test context). Problem is known and tracked by [SPARK-30629](https://issues.apache.org/jira/browse/SPARK-30629) Therefore, keeping this test active doesn't make sense, as it will lead to continuous test failures, when `testthat` is updated (#27359 / SPARK-23435). ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing tests. CC falaki Closes #27363 from zero323/SPARK-29777-FOLLOWUP. Authored-by: zero323 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

zero323 · 2020-01-27T02:14:10Z

@zero323 . Thank you for the screencast. However, it skipped all arrow related tests. Please playback the screencast.

Unfortunately R arrow is not standalone package (like Python one) and it requires system packages with C++ bindings (installing arrow package is not sufficient), And that's dependency hell as these R images (there still more stable than R-base ones) are not really designed for updates. So skipped Arrow tests are expected.

I don't think that really affects the results though, as primary concern was CRAN tests and overall process, and Arrow related code hasn't been touched.

dongjoon-hyun · 2020-01-27T02:33:16Z

We cannot say We're good when we know something wrong. I'd like to recommend you to mention what you've done clearly. That's enough. Please note that I'm supporting your effort on this PR. Otherwise, I'll not chim in here to add comments.

So skipped Arrow tests are expected.

Especially, for the following.

I don't think that really affects the results though, as primary concern was CRAN tests and overall process, and Arrow related code hasn't been touched.

zero323 · 2020-01-27T03:44:27Z

Please note that I'm supporting your effort on this PR. Otherwise, I'll not chim in here to add comments.

Thank you, I appreciate that.

In general, full reproducible is defined by the Dockerfile which is shown at the begging, but to put it here for reference

FROM rocker/verse:3.4.3
RUN apt-get update \
    && apt-get install -y --no-install-recommends gpg openjdk-8-jdk-headless \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*ce

RUN wget -qO- https://keybase.io/zero323/pgp_keys.asc | gpg --import
RUN git clone --depth 1 --branch SPARK-23435 https://github.com/zero323/spark.git
WORKDIR spark
RUN git rev-parse HEAD
RUN git verify-commit -v HEAD
RUN build/mvn -DskipTests -Phive -Psparkr clean package
RUN R --version
RUN R -e "install.packages(c('e1071', 'praise'))"
RUN R -e "install.packages('testthat', repos='https://cloud.r-project.org/'); packageVersion('testthat'); sessionInfo()"
RUN R/create-rd.sh
RUN R/create-docs.sh
RUN R/check-cran.sh
RUN R/run-tests.sh

It can be re-run to confirm that it reflects current state of things.

As show in the cast, build are done directly from this head of this branch (signature is verified) and no changes to the codebase, beyond what is proposed in this PR (and we don't touch any Arrow related components here at all).

As of skipping Arrow tests - that's default behavior defined in respective test for example here

spark/R/pkg/tests/fulltests/test_sparkSQL_arrow.R

Line 25 in 43d9c7e

skip_if_not_installed("arrow")

and following lines. So it is neither failure or result of any source modification.

Can we make arrow tests run? Possibly, but:

R Arrow package is not present in snapshot repositories used by rocker images. Installing testthat from https://cloud.r-project.org, already pushed things a lot. Additionally some transitive dependencies have hidden version bounds.
C++ Arrow bindings would require external system repositories, which can break decencies for R.
Using other images (let's say official R-base) is not an option, as we need Tex as well as OpenSSL and Curl dev libraries and this will either break or require update of R beyond 2.4 (at least it did for other build configurations I considered).

At this point Spark has no coverage for any intermediate R version (Jenkins runs 3.1 and then we have almost eight years of releases worth gap to 3.6 on AppVeyor), not to mention version-OS combinations. That's troubling, and as work related to this PR shown, can miss obvious errors. but not something that can be really addressed by running ad-hoc tests outside project infrastructure.

Anyway... If you have specific concerns about the process used here, and you suspect that proposed changes can lead to problems in the future, I'll do my best to address these.

dongjoon-hyun · 2020-01-27T05:54:02Z

? @zero323 . It seems that you missed my point. I advised like the following.

I'd like to recommend you to mention what you've done clearly. That's enough.

Let me rephrase my words. "In the PR description, write that you didn't run the full test. Especially Arrow tests are skipped". It was my request.

HyukjinKwon

LGTM.

Arrow is a valid point although AppVeyor (with R latest and CRAN disabled) tests it. It should be best just to test it out if that's feasible considering the importance of this PR and that changing env in Jenkins is difficult to revert if there's anything wrong.

If there's some technical difficulties, I am okay without testing out because the required codes change expect_true(TRUE) in the tests seem not required for Arrow tests logically at least.

HyukjinKwon

Okay, given #27359 (comment), I am okay with not trying Arrow installation out.

@shaneknapp made it installing Arrow on R 3.1 (surprisingly ..). Although it's a bit of a while ago and Arrow R was at the early stage, I remember I badly failed to install Arrow in some low R versions.

I think we can verify it later when we actually upgrade R version. Seems it should be easy to fix if there'a anything wrong with Arrow tests.

Also, I found a test case that has a similar structure with Arrow test case as a reference:

spark/R/pkg/tests/fulltests/test_sparkSQL.R

Lines 3327 to 3376 in d69ed9a

    
           test_that("repartition by columns on DataFrame", { 
        
             # The tasks here launch R workers with shuffles. So, we decrease the number of shuffle 
        
             # partitions to reduce the number of the tasks to speed up the test. This is particularly 
        
             # slow on Windows because the R workers are unable to be forked. See also SPARK-21693. 
        
             conf <- callJMethod(sparkSession, "conf") 
        
             shufflepartitionsvalue <- callJMethod(conf, "get", "spark.sql.shuffle.partitions") 
        
             callJMethod(conf, "set", "spark.sql.shuffle.partitions", "5") 
        
             tryCatch({ 
        
               df <- createDataFrame( 
        
                 list(list(1L, 1, "1", 0.1), list(1L, 2, "2", 0.2), list(3L, 3, "3", 0.3)), 
        
                 c("a", "b", "c", "d")) 
        
               # no column and number of partitions specified 
        
               retError <- tryCatch(repartition(df), error = function(e) e) 
        
               expect_equal(grepl 
        
                 ("Please, specify the number of partitions and/or a column\\(s\\)", retError), TRUE) 
        
               # repartition by column and number of partitions 
        
               actual <- repartition(df, 3, col = df$"a") 
        
               # Checking that at least the dimensions are identical 
        
               expect_identical(dim(df), dim(actual)) 
        
               expect_equal(getNumPartitions(actual), 3L) 
        
               # repartition by number of partitions 
        
               actual <- repartition(df, 13L) 
        
               expect_identical(dim(df), dim(actual)) 
        
               expect_equal(getNumPartitions(actual), 13L) 
        
               expect_equal(getNumPartitions(coalesce(actual, 1L)), 1L) 
        
               # a test case with a column and dapply 
        
               schema <-  structType(structField("a", "integer"), structField("avg", "double")) 
        
               df <- repartition(df, col = df$"a") 
        
               df1 <- dapply( 
        
                 df, 
        
                 function(x) { 
        
                   y <- (data.frame(x$a[1], mean(x$b))) 
        
                 }, 
        
                 schema) 
        
               # Number of partitions is equal to 2 
        
               expect_equal(nrow(df1), 2) 
        
             }, 
        
             finally = { 
        
               # Resetting the conf back to default value 
        
               callJMethod(conf, "set", "spark.sql.shuffle.partitions", shufflepartitionsvalue) 
        
             }) 
        
           })

So I suspect it's good enough to go (except one minor comment #27359 (comment))

Signed-off-by: zero323 <[email protected]>

zero323 · 2020-01-28T14:56:09Z

@dongjoon-hyun

? @zero323 . It seems that you missed my point. I advised like the following.

I'd like to recommend you to mention what you've done clearly. That's enough.

Let me rephrase my words. "In the PR description, write that you didn't run the full test. Especially Arrow tests are skipped". It was my request.

Seems like. I guess there is something lost in translation here

We cannot say We're good when we know something wrong

Especially what we consider "wrong" in this context, hence my elaborate response. Which I think is still valid ‒ overall test matrix is tiny compared to what is officially supported. Sadly Arrow makes things worse as

It is not Maven dependency.
Building from source is pain, and repositories don't provide archival versions.
Depending on OS, there are also additional dependencies.
Breaking API changes are not unusual (for example it changed between first 3.0.0 preview and the second), leading to rather cryptic errors.

It would be great if we could either isolate that (I played a bit with Anaconda for that, and results are quite promising, but I am not there yet, not to mention it won't help with all the tech. debt caused by supporting long outdated R and Python versions), or find a way to bundle within Maven builds.

For now I published zero323/sparkr-build-sandbox (docker.com/r/zero323/sparkr-build-sandbox) - not exactly what is needed, but does the trick for now.

Please check updated description and let me know if that addresses your concerns.

SparkQA · 2020-01-28T15:14:51Z

Test build #117484 has finished for PR 27359 at commit 56b1ba9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-01-28T22:17:38Z

Sorry, but it doesn't address my concerns.

Please check updated description and let me know if that addresses your concerns.

I ask you the following. In the above, I cannot see the following content. Even, you completely hide the word arrow.

"In the PR description, write that you didn't run the full test. Especially Arrow tests are skipped". It was my request.

dongjoon-hyun

@HyukjinKwon . Sorry, but you know that I'm not against that @zero323 skipped some tests due to the difficulty. I fully understand that it's sometime unfair to ask that to the contributors. However, the author of PR should mention what he've done in the PR description correctly.

I'm not sure why @zero323 adds many comments about this difficulty, but he didn't mention any of those difficulty into the PR description which will become a permanent commit log. Could you advise him about that? I already add the same comments repeatedly. So, it's too much for me.

Or, @HyukjinKwon . You can merge this PR as is. It's up to you. I'll not be against your decision.

zero323 · 2020-01-28T23:42:16Z

@dongjoon-hyun

"In the PR description, write that you didn't run the full test.

At this point all tests did run across multiple additional R versions. This is both documented in a persistent and verifiable way, and reproducible.

HyukjinKwon · 2020-01-29T00:51:04Z

@zero323, mind adding a couple of words to describe arrow was not installed due to technical difficulties, into the PR description? Seems like there was some miscommunication here.

It's fine to describe what happened during manual test. No one can't be flawless during such tests; it's better to describe it thoroughly for people who follow up later.

zero323 · 2020-01-29T01:05:11Z

@HyukjinKwon

mind adding a couple of words to describe arrow was not installed due to technical difficulties, into the PR description?

I was actually able to put all the dependencies together, including Arrow.

Seems like there was some miscommunication here.

Indeed. I guess I should have skipped explanations.

Anyway, I am happy to close this, and leave for someone more articulate to continue. Everything should be in good shape here from technical perspective, and linked images should provide good starting point for anyone willing to continue. Let's just see if AppVeyor failure is a false positive. Seems like the last run failed without clear indication why.

Thanks for your work @HyukjinKwon @dongjoon-hyun and sorry for any misunderstandings on my side.

HyukjinKwon

Alright, then I am going to merge this. LGTM.

SparkQA · 2020-01-29T01:28:45Z

Test build #117498 has finished for PR 27359 at commit 56b1ba9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2020-01-29T01:36:43Z

Merged to master.

- Update `testthat` to >= 2.0.0 - Replace of `testthat:::run_tests` with `testthat:::test_package_dir` - Add trivial assertions for tests, without any expectations, to avoid skipping. - Update related docs. `testthat` version has been frozen by [SPARK-22817](https://issues.apache.org/jira/browse/SPARK-22817) / apache#20003, but 1.0.2 is pretty old, and we shouldn't keep things in this state forever. No. - Existing CI pipeline: - Windows build on AppVeyor, R 3.6.2, testthtat 2.3.1 - Linux build on Jenkins, R 3.1.x, testthat 1.0.2 - Additional builds with thesthat 2.3.1 using [sparkr-build-sandbox](https://github.com/zero323/sparkr-build-sandbox) on c7ed64a R 3.4.4 (image digest ec9032f8cf98) ``` docker pull zero323/sparkr-build-sandbox:3.4.4 docker run zero323/sparkr-build-sandbox:3.4.4 zero323 --branch SPARK-23435 --commit c7ed64a --public-key https://keybase.io/zero323/pgp_keys.asc ``` 3.5.3 (image digest 0b1759ee4d1d) ``` docker pull zero323/sparkr-build-sandbox:3.5.3 docker run zero323/sparkr-build-sandbox:3.5.3 zero323 --branch SPARK-23435 --commit c7ed64a --public-key https://keybase.io/zero323/pgp_keys.asc ``` and 3.6.2 (image digest 6594c8ceb72f) ``` docker pull zero323/sparkr-build-sandbox:3.6.2 docker run zero323/sparkr-build-sandbox:3.6.2 zero323 --branch SPARK-23435 --commit c7ed64a --public-key https://keybase.io/zero323/pgp_keys.asc ```` Corresponding [asciicast](https://asciinema.org/) are available as 10.5281/zenodo.3629431 [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3629431.svg)](https://doi.org/10.5281/zenodo.3629431) (a bit to large to burden asciinema.org, but can run locally via `asciinema play`). ---------------------------- Continued from apache#27328 Closes apache#27359 from zero323/SPARK-23435. Authored-by: zero323 <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>

HyukjinKwon · 2020-01-29T02:30:41Z

@shaneknapp TR;DL: Now SparkR supports both testthat 1.0.2 and 2.0.0. I will backport this to branch-2.4 at #27379 as well.

So, as a reminder:

We should upgrade testthat in Jenkins to 2.0.0 at any earliest time you're available
We should upgrade R version to 3.4.x after Spark 3.0 release.

shaneknapp · 2020-01-29T02:39:10Z

@HyukjinKwon will do. the upgrade won't happen until next week.

### What changes were proposed in this pull request? This PR backports #27359: - Update `testthat` to >= 2.0.0 - Replace of `testthat:::run_tests` with `testthat:::test_package_dir` - Add trivial assertions for tests, without any expectations, to avoid skipping. - Update related docs. ### Why are the changes needed? `testthat` version has been frozen by [SPARK-22817](https://issues.apache.org/jira/browse/SPARK-22817) / #20003, but 1.0.2 is pretty old, and we shouldn't keep things in this state forever. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? - Existing CI pipeline: - Windows build on AppVeyor, R 3.6.2, testthtat 2.3.1 - Linux build on Jenkins, R 3.1.x, testthat 1.0.2 - Additional builds with thesthat 2.3.1 using [sparkr-build-sandbox](https://github.com/zero323/sparkr-build-sandbox) on c7ed64a R 3.4.4 (image digest ec9032f8cf98) ``` docker pull zero323/sparkr-build-sandbox:3.4.4 docker run zero323/sparkr-build-sandbox:3.4.4 zero323 --branch SPARK-23435 --commit c7ed64a --public-key https://keybase.io/zero323/pgp_keys.asc ``` 3.5.3 (image digest 0b1759ee4d1d) ``` docker pull zero323/sparkr-build-sandbox:3.5.3 docker run zero323/sparkr-build-sandbox:3.5.3 zero323 --branch SPARK-23435 --commit c7ed64a --public-key https://keybase.io/zero323/pgp_keys.asc ``` and 3.6.2 (image digest 6594c8ceb72f) ``` docker pull zero323/sparkr-build-sandbox:3.6.2 docker run zero323/sparkr-build-sandbox:3.6.2 zero323 --branch SPARK-23435 --commit c7ed64a --public-key https://keybase.io/zero323/pgp_keys.asc ```` Corresponding [asciicast](https://asciinema.org/) are available as 10.5281/zenodo.3629431 [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3629431.svg)](https://doi.org/10.5281/zenodo.3629431) (a bit to large to burden asciinema.org, but can run locally via `asciinema play`). ---------------------------- Continued from #27328 Closes #27379 from HyukjinKwon/testthat-2.0. Authored-by: zero323 <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>

shaneknapp · 2020-01-31T21:50:05Z

We should upgrade testthat in Jenkins to 2.0.0 at any earliest time you're available

should be easy.

We should upgrade R version to 3.4.x after Spark 3.0 release.

won't be easy. will be disruptive.

zero323 force-pushed the SPARK-23435 branch from cb5bf59 to 05c3f8d Compare January 25, 2020 13:35

zero323 force-pushed the SPARK-23435 branch from 05c3f8d to 3247cd4 Compare January 25, 2020 19:01

zero323 changed the title ~~[WIP][SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0~~ [SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 Jan 25, 2020

dongjoon-hyun reviewed Jan 25, 2020

View reviewed changes

R/pkg/tests/fulltests/test_sparkSQL.R Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Jan 25, 2020

View reviewed changes

R/pkg/tests/fulltests/test_sparkSQL.R Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Jan 25, 2020

View reviewed changes

R/pkg/tests/fulltests/test_utils.R Outdated Show resolved Hide resolved

zero323 mentioned this pull request Jan 26, 2020

[SPARK-30645][SPARKR][TESTS][WINDOWS] Move Unicode test data to external file #27362

Closed

zero323 force-pushed the SPARK-23435 branch from 3247cd4 to 9cac55f Compare January 26, 2020 01:10

HyukjinKwon reviewed Jan 26, 2020

View reviewed changes

R/pkg/tests/run-all.R Outdated Show resolved Hide resolved

zero323 mentioned this pull request Jan 26, 2020

[SPARK-29777][FOLLOW-UP][SPARKR] Remove no longer valid test for recursive calls #27363

Closed

Switch to testthat >= 2.0.0

05e8819

zero323 force-pushed the SPARK-23435 branch from 9cac55f to 05e8819 Compare January 26, 2020 05:23

HyukjinKwon approved these changes Jan 28, 2020

View reviewed changes

HyukjinKwon reviewed Jan 28, 2020

View reviewed changes

Add JIRA ticket to TODO removal comment

56b1ba9

Signed-off-by: zero323 <[email protected]>

dongjoon-hyun requested changes Jan 28, 2020

View reviewed changes

HyukjinKwon approved these changes Jan 29, 2020

View reviewed changes

HyukjinKwon closed this in 298d0a5 Jan 29, 2020

HyukjinKwon mentioned this pull request Jan 29, 2020

[SPARK-23435][2.4][SPARKR][TESTS] Update testthat to >= 2.0.0 #27379

Closed

zero323 deleted the SPARK-23435 branch January 29, 2020 09:59

HyukjinKwon mentioned this pull request Mar 15, 2020

[SPARK-31155] Remove pydocstyle tests #27912

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 #27359

[SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 #27359

zero323 commented Jan 25, 2020 •

edited

Loading

SparkQA commented Jan 25, 2020

SparkQA commented Jan 25, 2020

SparkQA commented Jan 25, 2020

SparkQA commented Jan 25, 2020

SparkQA commented Jan 25, 2020

SparkQA commented Jan 25, 2020

SparkQA commented Jan 25, 2020

zero323 commented Jan 25, 2020

HyukjinKwon commented Jan 26, 2020

zero323 commented Jan 26, 2020

HyukjinKwon commented Jan 26, 2020 •

edited

Loading

zero323 commented Jan 26, 2020

SparkQA commented Jan 26, 2020

HyukjinKwon commented Jan 26, 2020 •

edited

Loading

zero323 commented Jan 26, 2020

zero323 commented Jan 27, 2020

dongjoon-hyun commented Jan 27, 2020 •

edited

Loading

zero323 commented Jan 27, 2020

dongjoon-hyun commented Jan 27, 2020

HyukjinKwon left a comment

HyukjinKwon left a comment •

edited

Loading

zero323 commented Jan 28, 2020

SparkQA commented Jan 28, 2020

dongjoon-hyun commented Jan 28, 2020

dongjoon-hyun left a comment •

edited

Loading

zero323 commented Jan 28, 2020

HyukjinKwon commented Jan 29, 2020

zero323 commented Jan 29, 2020

HyukjinKwon left a comment

SparkQA commented Jan 29, 2020

HyukjinKwon commented Jan 29, 2020

HyukjinKwon commented Jan 29, 2020

shaneknapp commented Jan 29, 2020

shaneknapp commented Jan 31, 2020

	test_that("repartition by columns on DataFrame", {
	# The tasks here launch R workers with shuffles. So, we decrease the number of shuffle
	# partitions to reduce the number of the tasks to speed up the test. This is particularly
	# slow on Windows because the R workers are unable to be forked. See also SPARK-21693.
	conf <- callJMethod(sparkSession, "conf")
	shufflepartitionsvalue <- callJMethod(conf, "get", "spark.sql.shuffle.partitions")
	callJMethod(conf, "set", "spark.sql.shuffle.partitions", "5")
	tryCatch({
	df <- createDataFrame(
	list(list(1L, 1, "1", 0.1), list(1L, 2, "2", 0.2), list(3L, 3, "3", 0.3)),
	c("a", "b", "c", "d"))

	# no column and number of partitions specified
	retError <- tryCatch(repartition(df), error = function(e) e)
	expect_equal(grepl
	("Please, specify the number of partitions and/or a column\\(s\\)", retError), TRUE)

	# repartition by column and number of partitions
	actual <- repartition(df, 3, col = df$"a")

	# Checking that at least the dimensions are identical
	expect_identical(dim(df), dim(actual))
	expect_equal(getNumPartitions(actual), 3L)

	# repartition by number of partitions
	actual <- repartition(df, 13L)
	expect_identical(dim(df), dim(actual))
	expect_equal(getNumPartitions(actual), 13L)

	expect_equal(getNumPartitions(coalesce(actual, 1L)), 1L)

	# a test case with a column and dapply
	schema <- structType(structField("a", "integer"), structField("avg", "double"))
	df <- repartition(df, col = df$"a")

	df1 <- dapply(
	df,
	function(x) {
	y <- (data.frame(x$a[1], mean(x$b)))
	},
	schema)

	# Number of partitions is equal to 2
	expect_equal(nrow(df1), 2)
	},
	finally = {
	# Resetting the conf back to default value
	callJMethod(conf, "set", "spark.sql.shuffle.partitions", shufflepartitionsvalue)
	})
	})

[SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 #27359

[SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 #27359

Conversation

zero323 commented Jan 25, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Jan 25, 2020

SparkQA commented Jan 25, 2020

SparkQA commented Jan 25, 2020

SparkQA commented Jan 25, 2020

SparkQA commented Jan 25, 2020

SparkQA commented Jan 25, 2020

SparkQA commented Jan 25, 2020

zero323 commented Jan 25, 2020

HyukjinKwon commented Jan 26, 2020

zero323 commented Jan 26, 2020

HyukjinKwon commented Jan 26, 2020 • edited Loading

zero323 commented Jan 26, 2020

SparkQA commented Jan 26, 2020

HyukjinKwon commented Jan 26, 2020 • edited Loading

zero323 commented Jan 26, 2020

zero323 commented Jan 27, 2020

dongjoon-hyun commented Jan 27, 2020 • edited Loading

zero323 commented Jan 27, 2020

dongjoon-hyun commented Jan 27, 2020

HyukjinKwon left a comment

Choose a reason for hiding this comment

HyukjinKwon left a comment • edited Loading

Choose a reason for hiding this comment

zero323 commented Jan 28, 2020

SparkQA commented Jan 28, 2020

dongjoon-hyun commented Jan 28, 2020

dongjoon-hyun left a comment • edited Loading

Choose a reason for hiding this comment

zero323 commented Jan 28, 2020

HyukjinKwon commented Jan 29, 2020

zero323 commented Jan 29, 2020

HyukjinKwon left a comment

Choose a reason for hiding this comment

SparkQA commented Jan 29, 2020

HyukjinKwon commented Jan 29, 2020

HyukjinKwon commented Jan 29, 2020

shaneknapp commented Jan 29, 2020

shaneknapp commented Jan 31, 2020

zero323 commented Jan 25, 2020 •

edited

Loading

HyukjinKwon commented Jan 26, 2020 •

edited

Loading

HyukjinKwon commented Jan 26, 2020 •

edited

Loading

dongjoon-hyun commented Jan 27, 2020 •

edited

Loading

HyukjinKwon left a comment •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading