Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fct_collapse produces wrong counts when group_other = TRUE #202

Closed
hongcui opened this issue Aug 8, 2019 · 4 comments
Closed

fct_collapse produces wrong counts when group_other = TRUE #202

hongcui opened this issue Aug 8, 2019 · 4 comments

Comments

@hongcui
Copy link

hongcui commented Aug 8, 2019

Using the example given in the package.

Remove the grouping of Other Party, add group_other=TRUE, expect Other has a count of 'Other Party' 393, but result shows 3490.

library(tidyverse)
fct_count(gss_cat$partyid)
#> # A tibble: 10 x 2
#>    f                      n
#>    <fct>              <int>
#>  1 No answer            154
#>  2 Don't know             1
#>  3 Other party          393
#>  4 Strong republican   2314
#>  5 Not str republican  3032
#>  6 Ind,near rep        1791
#>  7 Independent         4119
#>  8 Ind,near dem        2499
#>  9 Not str democrat    3690
#> 10 Strong democrat     3490

partyid2 <- fct_collapse(gss_cat$partyid,
                         missing = c("No answer", "Don't know"),
                         rep = c("Strong republican", "Not str republican"),
                         ind = c("Ind,near rep", "Independent", "Ind,near dem"),
                         dem = c("Not str democrat", "Strong democrat"),
                         group_other = TRUE
)
fct_count(partyid2)
#> # A tibble: 5 x 2
#>   f           n
#>   <fct>   <int>
#> 1 missing   155
#> 2 rep      2707
#> 3 ind      8942
#> 4 dem      6189
#> 5 Other    3490

Created on 2019-08-08 by the reprex package (v0.3.0)

@hadley
Copy link
Member

hadley commented Aug 8, 2019

Could you please rework your reproducible example to use the reprex package ? That makes it easier to see both the input and the output, formatted in such a way that I can easily re-run in a local session.

@hongcui
Copy link
Author

hongcui commented Aug 8, 2019

Of course. Just did. Thank you!

@batpigandme
Copy link
Contributor

batpigandme commented Aug 14, 2019

OK, I'm digging into this one. Basically, we're corrently assigning the new names over the old factor levels as though they were in the order of the new levels (they're not — it's just that they happened to be in close enough order that some of the results were right, currently, it assumes that whatever the last item in your original list of factors is what you want to put into "Other").

Happens here:

names(levs) <- names(new)[rep(seq_along(new), vapply(new, length, integer(1)))]

@batpigandme
Copy link
Contributor

I'm closing this as a duplicate of #172, since both are fixed by #176.

hadley pushed a commit that referenced this issue Sep 3, 2019
gtm19 added a commit to gtm19/forcats that referenced this issue Jan 21, 2020
Moving the bullet point referencing tidyverse#172 and tidyverse#202, as it was put under 0.4.0 heading by mistake -- it was not part of that release. Raised in issue tidyverse#219.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants