Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add weighting to fct_lump #70

Merged
merged 1 commit into from
Feb 10, 2018
Merged

Add weighting to fct_lump #70

merged 1 commit into from
Feb 10, 2018

Conversation

wilkox
Copy link
Contributor

@wilkox wilkox commented Dec 29, 2016

Adds a 'weights' argument to fct_lump that allows frequencies to be weighted. Issue #68.

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

Overall the strategy looks good, but there are some code style fixes needed.

R/lump.R Outdated
ties.method = c("min", "average", "first", "last", "random", "max")) {
f <- check_factor(f)
ties.method <- match.arg(ties.method)

if (! missing(weights)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the spaces after !, and use != not ! x == y

R/lump.R Outdated
if (! is.numeric(weights)) {
stop("`weights` must be a numeric vector")
} else if (! length(f) == length(weights)) {
stop("different lengths of `f` and `weights`")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error messages should start with a capital letter and always use call. = FALSE

R/lump.R Outdated
if (missing(weights)) {
count <- table(f)
} else {
count <- tapply(weights, f, FUN=sum)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need spaces around =

@@ -2,44 +2,76 @@ context("fct_lump")

test_that("positive values keeps most commmon", {
f <- c("a", "a", "a", "b", "b", "c", "d", "e", "f", "g")
w <- c( 1, 1, 1, 1, 1, 4, 2, 1, 1, 1) # Sum = 14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tests would be clearer if you tested weighting in its own block. I think you only need a couple of tests to verify that weighted and the equivalent unweighted return the same values. The code is quite simple so it doesn't need extensive tests.

You might want to add one other test that checks that bad weights generate error messages.

@hadley
Copy link
Member

hadley commented Dec 30, 2016

Also in one of the commit message please include Fixes #68 that way the issue will be automatically closed when the pull is merged.

@wilkox
Copy link
Contributor Author

wilkox commented Dec 30, 2016

Tidied up the style issues, simplified the tests and added tests for errors, and rebased.

@wilkox
Copy link
Contributor Author

wilkox commented Jul 12, 2017

Thanks @instantkaffee – fixed the error and rebased.

@andwilson
Copy link

Is this going to merge into the master? This is extremely useful functionality.

@tiernanmartin
Copy link

I'd also like to see this feature merged into the master - thanks!

R/lump.R Outdated
#' # Use ties.method to control how tied factors are collapsed
#' fct_lump(x, n = 6)
#' fct_lump(x, n = 6, ties.method = "max")
#'
fct_lump <- function(f, n, prop, other_level = "Other",
fct_lump <- function(f, n, prop, weights, other_level = "Other",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise now that weights should have a default value since it's option. @wilkox are you still interested in this PR? If not, I'm happy to make the change, but since you've put so much time into it, I thought you might like to carry it home.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still interested in seeing this merged 😃 Added a default weights = NULL, which seems less confusing than weights = 1.

f2 <- c("a", rep("b", 4), rep("c", 6), rep("d", 4), rep("e", 2),
rep("f", 2), "g")

expect_equal(levels(fct_lump(f, weights = w)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please use the style described at http://style.tidyverse.org/syntax.html#long-lines ?

@hadley
Copy link
Member

hadley commented Feb 10, 2018

One last change before I’ll merge - can you please add a bullet to news following http://style.tidyverse.org/news.html#during-development

* Add tests for weighted fct_lump
* Update fct_lump documentation
@wilkox
Copy link
Contributor Author

wilkox commented Feb 10, 2018

Added the bullet to NEWS.md, rebased.

@hadley hadley merged commit 81f01c0 into tidyverse:master Feb 10, 2018
@hadley
Copy link
Member

hadley commented Feb 10, 2018

Thanks for bearing with me through this long process!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants