Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New fixed_regex_linter #1021

Merged
merged 34 commits into from
May 19, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
f67e9a0
New fixed_regex_linter
MichaelChirico Mar 28, 2022
67c9e1d
fix package name in so
MichaelChirico Mar 28, 2022
f7954ba
back-compatible read-only macro
MichaelChirico Mar 28, 2022
4663843
Merge branch 'master' into fixed_regex
MichaelChirico Mar 28, 2022
033de02
Merge branch 'master' into fixed_regex
MichaelChirico Apr 1, 2022
8b47bf8
Merge branch 'master' into fixed_regex
MichaelChirico Apr 3, 2022
3a0739b
Merge branch 'master' into fixed_regex
MichaelChirico Apr 4, 2022
b7957e7
Merge branch 'master' into fixed_regex
AshesITR Apr 4, 2022
f24acf1
Merge branch 'master' into fixed_regex
MichaelChirico Apr 8, 2022
e3b309f
Merge branch 'fixed_regex' of github.com:jimhester/lintr into fixed_r…
MichaelChirico Apr 8, 2022
1df86f3
add a test of str_replace_all
MichaelChirico Apr 8, 2022
26e83d1
apply EQ_SUB skip on this branch too for parity
MichaelChirico Apr 8, 2022
de4973e
use one-string regex to eliminate false negatives
MichaelChirico Apr 9, 2022
0943393
copy tests over from R branch
MichaelChirico Apr 9, 2022
7486983
T is SYMBOL not NUM_CONST
MichaelChirico Apr 9, 2022
06f8930
extend [$CHAR] handling to handle many more cases
MichaelChirico Apr 10, 2022
f33e182
test for []]
MichaelChirico Apr 10, 2022
d470a95
catch unicode escapes outside char classes too
MichaelChirico Apr 10, 2022
8a9c9f0
more cases that are false positives in the R-only branch
MichaelChirico Apr 10, 2022
1e94aa0
Merge branch 'master' into fixed_regex
MichaelChirico Apr 10, 2022
ac2237e
try different UTF-16 codepoints
MichaelChirico Apr 10, 2022
cdca617
try using the ?Quotes example code point
MichaelChirico Apr 10, 2022
968c2d3
Merge branch 'master' into fixed_regex
MichaelChirico Apr 27, 2022
c085a2e
more tests
MichaelChirico Apr 28, 2022
9fe1861
Merge branch 'master' into fixed_regex
MichaelChirico Apr 28, 2022
ec2cf87
merge tests from R branch
AshesITR Apr 28, 2022
4da8d25
Merge branch 'master' into fixed_regex
MichaelChirico Apr 29, 2022
5b1dc59
source_expression
MichaelChirico Apr 29, 2022
b0e62b7
typo
MichaelChirico Apr 29, 2022
438ba37
restore tests with single escaping; add comment about perl=TRUE TODO
MichaelChirico Apr 29, 2022
00519de
Merge branch 'master' into fixed_regex
MichaelChirico May 16, 2022
03b7111
corresponding test
MichaelChirico May 17, 2022
ca0be28
Merge branch 'master' into fixed_regex
MichaelChirico May 17, 2022
d9d287c
test parsing
MichaelChirico May 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ script.R
*~
\#*\#

*.o
*.so

*.Rcheck
lintr_*.tar.gz
testthat-problems.rds
Expand Down
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ Collate:
'expect_type_linter.R'
'extract.R'
'extraction_operator_linter.R'
'fixed_regex_linter.R'
'function_left_parentheses.R'
'get_source_expressions.R'
'ids_with_token.R'
Expand Down
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ export(expect_s4_class_linter)
export(expect_true_false_linter)
export(expect_type_linter)
export(extraction_operator_linter)
export(fixed_regex_linter)
export(function_left_parentheses_linter)
export(get_source_expressions)
export(ids_with_token)
Expand Down Expand Up @@ -123,3 +124,4 @@ importFrom(utils,relist)
importFrom(utils,tail)
importFrom(xml2,as_list)
importFrom(xml2,xml_find_all)
useDynLib(lintr, .registration = TRUE, .fixes = "lintr_")
108 changes: 108 additions & 0 deletions R/fixed_regex_linter.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
#' Require usage of `fixed=TRUE` in regular expressions where appropriate
#'
#' Invoking a regular expression engine is overkill for cases when the search
#' pattern only involves static patterns.
#'
#' NB: for `stringr` functions, that means wrapping the pattern in `stringr::fixed()`.
#'
#' NB: This linter is likely not able to distinguish every possible case when
#' a fixed regular expression is preferable, rather it seeks to identify
#' likely cases. It should _never_ report false positives, however; please
#' report false positives as an error.
#'
#' @evalRd rd_tags("fixed_regex_linter")
#' @seealso [linters] for a complete list of linters available in lintr.
#' @export
fixed_regex_linter <- function() {
Linter(function(source_expression) {
if (length(source_expression$xml_parsed_content) == 0L) {
return(list())
}

xml <- source_expression$xml_parsed_content

# regular expression pattern is the first argument
pos_1_regex_funs <- xp_text_in_table(c(
"grep", "gsub", "sub", "regexec", "grepl", "regexpr", "gregexpr"
))

# regular expression pattern is the second argument
pos_2_regex_funs <- xp_text_in_table(c(
"strsplit", "tstrsplit",
# stringr functions. even though the user action is different
# (setting fixed=TRUE vs. wrapping stringr::fixed()),
# detection of the lint is the same
"str_count", "str_detect", "str_ends", "str_extract", "str_extract_all",
"str_locate", "str_locate_all", "str_match", "str_match_all",
"str_remove", "str_remove_all", "str_replace", "str_replace_all",
"str_split", "str_starts", "str_subset",
"str_view", "str_view_all", "str_which"
))

# NB: strsplit doesn't have an ignore.case argument
# NB: we intentionally exclude cases like gsub(x, c("a" = "b")), where "b" is fixed
xpath <- glue::glue("//expr[
SYMBOL_FUNCTION_CALL[ {pos_1_regex_funs} ]
and not(following-sibling::SYMBOL_SUB[
(text() = 'fixed' or text() = 'ignore.case')
and following-sibling::expr[1][NUM_CONST[text() = 'TRUE'] or SYMBOL[text() = 'T']]
])
]
/following-sibling::expr[1][STR_CONST and not(EQ_SUB)]
|
//expr[
SYMBOL_FUNCTION_CALL[ {pos_2_regex_funs} ]
and not(following-sibling::SYMBOL_SUB[
text() = 'fixed'
and following-sibling::expr[1][NUM_CONST[text() = 'TRUE'] or SYMBOL[text() = 'T']]
])
]
/following-sibling::expr[2][STR_CONST and not(EQ_SUB)]
")

patterns <- xml2::xml_find_all(xml, xpath)

return(lapply(
patterns[is_not_regex(xml2::xml_text(patterns))],
xml_nodes_to_lint,
source_expression = source_expression,
lint_message = paste(
"For static regular expression patterns, set `fixed = TRUE`.",
"Note that this includes regular expressions that can be expressed as",
"fixed patterns, e.g. [.] is really just . and \\$ is really just $",
"if there are no other regular expression specials. For functions from",
"the 'stringr' package, the way to declare a static string is to",
"wrap the pattern in stringr::fixed().",
"If this is being used in a dbplyr context (i.e., translated to sql),",
"replace the regular expression with the `LIKE` operator using the",
"`%LIKE%` infix function.",
"Lastly, take care to remember that the `replacement` argument of",
"`gsub()` is affected by the `fixed` argument as well."
),
type = "warning"
))
})
}

#' Determine whether a regex pattern actually uses regex patterns
#'
#' Note that is applies to the strings that are found on the XML parse tree,
#' _not_ plain strings. This is important for backslash escaping, which
#' happens at different layers of escaping than one might expect. So testing
#' this function is best done through testing the expected results of a lint
#' on a given file, rather than passing strings to this function, which can
#' be confusing.
#'
#' NB: Tried implementing this at the R level, but the backsplash escaping was
#' becoming nightmarish -- after changing to a character-based approach in R,
#' the code loooked 95% similar to what it would look like in C++, so moved
#' the logic there to get the efficiency boost as well.
#'
#' @param str A character vector.
#' @return A logical vector, `TRUE` wherever `str` could be replaced by a
#' string with `fixed = TRUE`.
#' @noRd
#' @useDynLib lintr, .registration = TRUE, .fixes = "lintr_"
is_not_regex <- function(str, skip_start = FALSE, skip_end = FALSE) {
.Call(lintr_is_not_regex, str, skip_start, skip_end)
}
1 change: 1 addition & 0 deletions inst/lintr/linters.csv
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ expect_s4_class_linter,package_development best_practices
expect_true_false_linter,package_development best_practices readability
expect_type_linter,package_development best_practices
extraction_operator_linter,style best_practices
fixed_regex_linter,best_practices readability efficiency
function_left_parentheses_linter,style readability default
ifelse_censor_linter,best_practices efficiency
implicit_integer_linter,style consistency best_practices
Expand Down
1 change: 1 addition & 0 deletions man/best_practices_linters.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/efficiency_linters.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

26 changes: 26 additions & 0 deletions man/fixed_regex_linter.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 4 additions & 3 deletions man/linters.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/readability_linters.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading