-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r #19290
Conversation
@@ -1,2 +1,2 @@ | |||
linters: with_defaults(line_length_linter(100), multiple_dots_linter = NULL, camel_case_linter = NULL, open_curly_linter(allow_single_line = TRUE), closed_curly_linter(allow_single_line = TRUE)) | |||
linters: with_defaults(line_length_linter(100), multiple_dots_linter = NULL, object_name_linter = NULL, camel_case_linter = NULL, open_curly_linter(allow_single_line = TRUE), closed_curly_linter(allow_single_line = TRUE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object_name_linter = NULL
looks required. Otherwise, it complains about inconsistent naming styles.
R/pkg/R/DataFrame.R
Outdated
@@ -2649,15 +2651,15 @@ setMethod("merge", | |||
#' @return list of columns | |||
#' | |||
#' @note generateAliasesForIntersectedCols since 1.6.0 | |||
generateAliasesForIntersectedCols <- function (x, intersectedColNames, suffix) { | |||
generateAliasesForIntersectedCols <- function(x, intersectedColNames, suffix) { # nolint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks this hits 30 length identifier limit but we can't change as exposed in the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait... this isn't public though; it's not in https://github.com/apache/spark/blob/master/R/pkg/NAMESPACE
looks like this shouldn't be in the doc (@noRd
or #
instead of #'
)
R/pkg/R/column.R
Outdated
#' \href{https://spark.apache.org/docs/latest/sparkr.html#data-type-mapping-between-r-and-spark}{ | ||
#' Spark Data Types} for available data types. | ||
#' \href{https://spark.apache.org/docs/latest/sparkr.html#data-type-mapping-between- | ||
#' r-and-spark}{Spark Data Types} for available data types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked this link is not broken for sure.
@@ -329,7 +329,7 @@ spark.addFile <- function(path, recursive = FALSE) { | |||
#' spark.getSparkFilesRootDirectory() | |||
#'} | |||
#' @note spark.getSparkFilesRootDirectory since 2.1.0 | |||
spark.getSparkFilesRootDirectory <- function() { | |||
spark.getSparkFilesRootDirectory <- function() { # nolint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto: 30 length identifier limit but exposed in the doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one is an API...
R/pkg/R/mllib_tree.R
Outdated
#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest- | ||
#' regression}{Random Forest Regression} and | ||
#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest- | ||
#' classifier}{Random Forest Classification} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
links were checked
R/pkg/R/mllib_tree.R
Outdated
#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted- | ||
#' tree-regression}{GBT Regression} and | ||
#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted- | ||
#' tree-classifier}{GBT Classification} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
links were checked
R/pkg/R/mllib_tree.R
Outdated
#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree- | ||
#' regression}{Decision Tree Regression} and | ||
#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree- | ||
#' classifier}{Decision Tree Classification} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
links were checked
dev/lint-r.R
Outdated
@@ -28,6 +28,7 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, logical.return = TRUE)) { | |||
# NOTE: The CRAN's version is too old to adapt to our rules. | |||
if ("lintr" %in% row.names(installed.packages()) == FALSE) { | |||
devtools::install_github("jimhester/lintr@a769c0b") | |||
# devtools::install_github("jimhester/lintr@5431140") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I didn't yet change this. Up to my knowledge, this doesn't actually try to install this package if already installed though.. just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because of if ("lintr" %in% row.names(installed.packages()) == FALSE) {
?
it might be the case the Jenkins boxes already have a version installed and this won't update it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes .. I think I should add few more changes here if we are going to update this package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the commented is an old code. Maybe a comment to explain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, just to be clear, I didn't change this yet but just commented this for now ... Actually, devtools::install_github("jimhester/lintr@5431140")
is new one but I didn't change this yet just in case.
Will add some comments around this when I add few more changes and clean up soon in any event.
@felixcheung and @shivaram, Should we upgrade this to the latest commit? it increases the time by 5ish mins. If there is any worry about upgrading, I am willing to leave the script as was. The other fixes look reasonable and I have wondered why these are not being caught. Does this make sense to both of you too? |
Test build #81988 has finished for PR 19290 at commit
|
Test build #82000 has finished for PR 19290 at commit
|
@HyukjinKwon Thanks for looking at this. The 5 min addition seems unfortunate though -- does that also happen with lintr-1.0.1 ? I wonder if we are seeing some specific performance slowdown because of not using a release (it might be good to follow up with lint-r authors if that is the case ?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should use the git tag of the 1.0.1 release
https://github.com/jimhester/lintr/releases
R/pkg/R/DataFrame.R
Outdated
colsX <- generateAliasesForIntersectedCols(x, by, suffixes[1]) | ||
colsY <- generateAliasesForIntersectedCols(y, by, suffixes[2]) | ||
colsX <- generateAliasesForIntersectedCols(x, by, suffixes[1]) # nolint | ||
colsY <- generateAliasesForIntersectedCols(y, by, suffixes[2]) # nolint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the problem here? name too long I think?
R/pkg/R/DataFrame.R
Outdated
@@ -2649,15 +2651,15 @@ setMethod("merge", | |||
#' @return list of columns | |||
#' | |||
#' @note generateAliasesForIntersectedCols since 1.6.0 | |||
generateAliasesForIntersectedCols <- function (x, intersectedColNames, suffix) { | |||
generateAliasesForIntersectedCols <- function(x, intersectedColNames, suffix) { # nolint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait... this isn't public though; it's not in https://github.com/apache/spark/blob/master/R/pkg/NAMESPACE
looks like this shouldn't be in the doc (@noRd
or #
instead of #'
)
@@ -329,7 +329,7 @@ spark.addFile <- function(path, recursive = FALSE) { | |||
#' spark.getSparkFilesRootDirectory() | |||
#'} | |||
#' @note spark.getSparkFilesRootDirectory since 2.1.0 | |||
spark.getSparkFilesRootDirectory <- function() { | |||
spark.getSparkFilesRootDirectory <- function() { # nolint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one is an API...
dev/lint-r.R
Outdated
@@ -28,6 +28,7 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, logical.return = TRUE)) { | |||
# NOTE: The CRAN's version is too old to adapt to our rules. | |||
if ("lintr" %in% row.names(installed.packages()) == FALSE) { | |||
devtools::install_github("jimhester/lintr@a769c0b") | |||
# devtools::install_github("jimhester/lintr@5431140") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because of if ("lintr" %in% row.names(installed.packages()) == FALSE) {
?
it might be the case the Jenkins boxes already have a version installed and this won't update it
I think this is great to have, thanks for solving the mystery.
|
To @shivaram:
I just ran it with lintr-1.0.1. This took 2ish mins in my local, which is almost identical with the current, r-lib/lintr@a769c0b. I opened an issue here - To @felixcheung:
I tried to disable all the linters documented in https://github.com/jimhester/lintr/blob/master/README.md#available-linters: diff --git a/R/pkg/.lintr b/R/pkg/.lintr
index ae50b28ec61..ca1efcfacc7 100644
--- a/R/pkg/.lintr
+++ b/R/pkg/.lintr
@@ -1,2 +1,2 @@
-linters: with_defaults(line_length_linter(100), multiple_dots_linter = NULL, camel_case_linter = NULL, open_curly_linter(allow_single_line = TRUE), closed_curly_linter(allow_single_line = TRUE))
+linters: with_defaults(commas_linter = NULL, object_usage_linter = NULL, absolute_path_linter = NULL, nonportable_path_linter = NULL, pipe_continuation_linter = NULL, assignment_linter = NULL, closed_curly_linter = NULL, extraction_operator_linter = NULL, implicit_integer_linter = NULL, infix_spaces_linter = NULL, line_length_linter = NULL, no_tab_linter = NULL, object_length_linter = NULL, object_name_linter = NULL, open_curly_linter = NULL, semicolon_terminator_linter = NULL, single_quotes_linter = NULL, spaces_inside_linter = NULL, spaces_left_parentheses_linter = NULL, todo_comment_linter = NULL, trailing_blank_lines_linter = NULL, trailing_whitespace_linter = NULL, T_and_F_symbol_linter = NULL, undesirable_function_linter = NULL, undesirable_operator_linter = NULL, unneeded_concatenation_linter = NULL)
exclusions: list("inst/profile/general.R" = 1, "inst/profile/shell.R") and this took 1.5 mins. Looks we could identify some linters that we might not need.
Diff looks 122 commits - r-lib/lintr@v1.0.1...master . Will take a look though the commit logs (pending ..) |
Test build #82034 has finished for PR 19290 at commit
|
isn't it the other way around? anyway I mean if we use the git tag 9951084 it should match v1.0.1 release exactly. |
btw, could you check if haven't already, if |
Yes, I checked already. |
Test build #82081 has started for PR 19290 at commit |
Will update the comment tomorrow. |
Ah, yes. It looks some changes are intendedly not into v1.0.1 (vs the master), and given the results from the linter in my local, v1.0.1 missed some important changes. Although I am not used to the policy there, I see the large diff in the commit logs (master vs tag v1.0.1). |
Assuming from the doc, 1.0.1 looks missing the linters below (vs master):
I manually disabled the linters above but the elapsed time looks almost the same. So, I think some existing linters were slowed down in the master. I tried to disable everything except for each linter, https://gist.github.com/HyukjinKwon/1f852bcdcc2a13f396dfd2d6a88bea16 |
I also tried v1.0.1 against the master; however, it looks not detecting many instances this PR fixes. It only detected 3 extra instances:
So, I would like to propose to use r-lib/lintr@5431140 if possible. I opened an issue to ask a question about this slowdown - |
retest this please |
Test build #82120 has finished for PR 19290 at commit
|
R/pkg/R/DataFrame.R
Outdated
@@ -2650,8 +2650,9 @@ setMethod("merge", | |||
#' @param suffix a suffix for the column name | |||
#' @return list of columns | |||
#' | |||
#' @note generateAliasesForIntersectedCols since 1.6.0 | |||
generateAliasesForIntersectedCols <- function(x, intersectedColNames, suffix) { # nolint | |||
#' @note genAliasesForIntersectedCols since 1.6.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd remove this @note
too
oh, remember to remove WIP and update this line https://github.com/apache/spark/pull/19290/files#diff-74ca2b618d236bbd6faa23e13bff1403R30 before merging in? |
7e6c2c5
to
387228d
Compare
dev/lint-r.R
Outdated
# The current latest is jimhester/lintr@5431140 (see SPARK-22063), the dev version, | ||
# '1.0.1.9000'. | ||
if ("lintr" %in% row.names(installed.packages()) == FALSE || | ||
packageVersion("lintr") != "1.0.1.9000") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked this - when downgrading, upgrading and not installed in my local.
@@ -238,8 +238,10 @@ setMethod("between", signature(x = "Column"), | |||
#' @param x a Column. | |||
#' @param dataType a character object describing the target data type. | |||
#' See | |||
# nolint start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just double checked the links.
Ugh.. it looks failed to install due to permission issue ...
|
Hi @shaneknapp. I am sorry but it's me again ... Here, this PR tries to upgrade an R package, lintr for static code analysis for R, which is ran via This upgrade catches many meaningful instances that should have been caught before in SparkR codes and if I understood correctly, we decided to upgrade this if it does not cause a problem. I tried to fix the I believe the bash command below Rscript -e "devtools::install_github('jimhester/lintr@5431140')" Maybe, I should have cc'ed you here first before trying this one. I apologise and I will cc you first in such cases in the future. |
Test build #82128 has finished for PR 19290 at commit
|
Test build #82129 has finished for PR 19290 at commit
|
@HyukjinKwon -- you will absolutely not have builds install packages on the build system. this is a really bad idea. is this absolutely required, or just to fix a warning in the build output? |
ee7eb9d
to
550750e
Compare
I just double checked the newer lintr, lintr@5431140, passes on the top of the master with the current change. |
Test build #82308 has finished for PR 19290 at commit
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yap, let's update the PR title too
Test build #82312 has finished for PR 19290 at commit
|
hey all, i'm here. was out sick for the past few days and trying to get
caught up. sorry about that!
so... what version of lintr do we need to put on the workers?
…On Fri, Sep 29, 2017 at 3:32 AM, UCB AMPLab ***@***.***> wrote:
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82312/
Test PASSed.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19290 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABiDrC3TwFkXTT_wInCvG0W450khEmugks5snMdJgaJpZM4PdlZS>
.
|
Ah, thank you @shaneknapp. The command below:
should upgrades Actually, looks we should merge this one first before upgrading I updated the change here back to not try to upgrade Thanks again and I am sorry for this, @shaneknapp. |
btw, lintr once upgraded will run on all builds on all branches right? wouldn't the upgrade break other branches? |
@felixcheung -- yes, this is the system default lintr, meaning all calls to lintr will be against this version. as for other branches, i think it could possibly break them. @shivaram thoughts? |
Let me merge this one first. This shouldn't cause any problem to built system for now. |
Merged to master. |
So, @felixcheung, @shaneknapp and @shivaram, looks we have comments, #19290 (comment) and #19290 (comment), to discuss further. The problem looks upgrading Should I rather try to install this for each build, for example, via checking the environment variables as a workaround? I checked those before - #17669 (comment) |
i filed a bug about this on the spark jira: as we're about to move all the spark builds to new ubuntu machines, w/much more up2date package installs (especially R packages), we need to fix the lint problems and help make builds green again. |
Shane is this going to affect one particular branch (eg. 2.3.0), or is it going to be all branches and all test runs?
The changes are fairly substantial - if we need to back port then it could be quite a bit of work
|
the driving force behind this is that the version of R (and all associated
packages) is really quite old on the centos workers (3.1.1), and i'd really
like us to get as up to date as possible on the new ubuntu machines.
so far, lintr and testthat (opening a bug on that one soon) are the only
packages that haven't played nice after the upgrade.
if lintr 1.0.0.9001 is "good enough", i'd be ok w/locking the install to
this version.
…On Mon, Jan 8, 2018 at 7:08 PM, Felix Cheung ***@***.***> wrote:
Shane is this going to affect one particular branch (eg. 2.3.0), or is it
going to be all branches and all test runs?
The changes are fairly substantial - if we need to back port then it could
be quite a bit of work
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19290 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABiDrKH2y12reG2Ojum5cF5j4FOJU7lyks5tIthFgaJpZM4PdlZS>
.
|
Given that we are forking 2.3 and locking down the branch any time now, it might make sense to stay with the "current version running on old centos workers", even though they are old. |
BTW, I believe we are testing it with R 3.4.1 via AppVeyor too. I have been thinking it's good to test both old and new versions ... I think we have a weak promise for |
ok sounds good -- we'll keep things 'old' for now. |
The minimum R version supported is something that we can revisit though. I think we do this for Python, Java versions as well in the project |
Right we could bump the supported R version for the next release. It should have minimal impact (since we are testing the close to the latest on appveyor... somewhat internally)
lintr is quite a bit more complicated.
Shane what’s the issue with testthat?
|
I believe the issue with testthat he meatn is, it fails to run tests with testthat 2.0.0 is released. Just for a reminder, I rushed to update |
argh, thanks for the reminder and the fix. |
What changes were proposed in this pull request?
Currently, we set lintr to r-lib/lintr@a769c0b (see this and SPARK-14074).
I first tested and checked lintr-1.0.1 but it looks many important fixes are missing (for example, checking 100 length). So, I instead tried the latest commit, r-lib/lintr@5431140, in my local and fixed the check failures.
It looks it has fixed many bugs and now finds many instances that I have observed and thought should be caught time to time, here I filed the results.
The downside looks it now takes about 7ish mins, (it was 2ish mins before) in my local.
How was this patch tested?
Manually,
./dev/lint-r
after manually updating the lintr package.