-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fct_lump prop to consider NA #41
Comments
Can you please provide a reproducible example illustrating where this is a problem? |
f <- factor(c(rep("a",5), "b", rep(NA, 94))) |
The problem is that in your example f <- factor(c(rep("a", 5), "b", rep(NA, 94)), exclude = NULL)
fct_lump(f, prop = 0.02)
#> [1] a a a a a Other <NA> <NA> <NA> <NA> <NA>
#> [12] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> [23] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> [34] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> [45] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> [56] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> [67] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> [78] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> [89] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> [100] <NA>
#> Levels: a Other |
Yes I have realized that there are bugs with NA as a level. The problem goes away when it is replaced by an explicit string. My point is that prop should be considered as a proportion of the length of the vector, not just those with associated levels. |
Currently, prop_n is calculated as the count divided by sum of counts, which excludes NA.
If prop_n is calculated as count divided by length(f), the proportion would take into account NA values and reflect the true proportion of the level, not the proportion of the level among non-NA values.
Thanks for the great work.
The text was updated successfully, but these errors were encountered: