-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add weighting to fct_lump #70
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
Overall the strategy looks good, but there are some code style fixes needed.
R/lump.R
Outdated
ties.method = c("min", "average", "first", "last", "random", "max")) { | ||
f <- check_factor(f) | ||
ties.method <- match.arg(ties.method) | ||
|
||
if (! missing(weights)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the spaces after !
, and use !=
not ! x == y
R/lump.R
Outdated
if (! is.numeric(weights)) { | ||
stop("`weights` must be a numeric vector") | ||
} else if (! length(f) == length(weights)) { | ||
stop("different lengths of `f` and `weights`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error messages should start with a capital letter and always use call. = FALSE
R/lump.R
Outdated
if (missing(weights)) { | ||
count <- table(f) | ||
} else { | ||
count <- tapply(weights, f, FUN=sum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need spaces around =
tests/testthat/test-fct_lump.R
Outdated
@@ -2,44 +2,76 @@ context("fct_lump") | |||
|
|||
test_that("positive values keeps most commmon", { | |||
f <- c("a", "a", "a", "b", "b", "c", "d", "e", "f", "g") | |||
w <- c( 1, 1, 1, 1, 1, 4, 2, 1, 1, 1) # Sum = 14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the tests would be clearer if you tested weighting in its own block. I think you only need a couple of tests to verify that weighted and the equivalent unweighted return the same values. The code is quite simple so it doesn't need extensive tests.
You might want to add one other test that checks that bad weights generate error messages.
Also in one of the commit message please include |
Tidied up the style issues, simplified the tests and added tests for errors, and rebased. |
Thanks @instantkaffee – fixed the error and rebased. |
Is this going to merge into the master? This is extremely useful functionality. |
I'd also like to see this feature merged into the master - thanks! |
R/lump.R
Outdated
#' # Use ties.method to control how tied factors are collapsed | ||
#' fct_lump(x, n = 6) | ||
#' fct_lump(x, n = 6, ties.method = "max") | ||
#' | ||
fct_lump <- function(f, n, prop, other_level = "Other", | ||
fct_lump <- function(f, n, prop, weights, other_level = "Other", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise now that weights
should have a default value since it's option. @wilkox are you still interested in this PR? If not, I'm happy to make the change, but since you've put so much time into it, I thought you might like to carry it home.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still interested in seeing this merged 😃 Added a default weights = NULL
, which seems less confusing than weights = 1
.
tests/testthat/test-fct_lump.R
Outdated
f2 <- c("a", rep("b", 4), rep("c", 6), rep("d", 4), rep("e", 2), | ||
rep("f", 2), "g") | ||
|
||
expect_equal(levels(fct_lump(f, weights = w)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please use the style described at http://style.tidyverse.org/syntax.html#long-lines ?
One last change before I’ll merge - can you please add a bullet to news following http://style.tidyverse.org/news.html#during-development |
* Add tests for weighted fct_lump * Update fct_lump documentation
Added the bullet to NEWS.md, rebased. |
Thanks for bearing with me through this long process! |
Adds a 'weights' argument to fct_lump that allows frequencies to be weighted. Issue #68.