Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sort argument to fct_lump_ functions? #375

Open
DesiQuintans opened this issue Dec 16, 2024 · 0 comments
Open

Add sort argument to fct_lump_ functions? #375

DesiQuintans opened this issue Dec 16, 2024 · 0 comments

Comments

@DesiQuintans
Copy link

DesiQuintans commented Dec 16, 2024

The fct_lump_ functions are all about deciding which levels to keep or lump together based on their frequency, so I think that they should also have an option (e.g. sort = c("no", "asc", "desc")) to return those levels in an order that is based on their frequency.

Edit: Or perhaps, add an example to the man page for these functions showing that you can use fct_infreq() and its friends to do the ordering. However, doing this requires a fct_relevel() at the end to ensure "Other" is the last value.

Thank you for a great package! <3

library(forcats)

set.seed(12345)

repeated_states <- rep.int(x = state.name, times = runif(n = length(state.name), min = 1, max = 300))

sort(table(repeated_states), decreasing = TRUE)
#> repeated_states
#>        Georgia      Minnesota       Maryland          Texas   Pennsylvania 
#>            296            289            285            278            271 
#>       Arkansas         Alaska         Oregon     New Mexico   South Dakota 
#>            265            262            260            238            234 
#>           Utah        Arizona       Illinois        Florida        Alabama 
#>            232            228            220            218            216 
#>    Mississippi       Nebraska   North Dakota       Missouri        Wyoming 
#>            212            209            204            193            188 
#>   Rhode Island         Nevada       Delaware     New Jersey         Kansas 
#>            185            163            153            145            139 
#>     California  Massachusetts      Tennessee      Louisiana           Iowa 
#>            137            136            129            121            117 
#>       Kentucky        Montana           Ohio       Oklahoma    Connecticut 
#>            117            117            111            109             98 
#>       Michigan       Virginia        Vermont  New Hampshire North Carolina 
#>             98             97             78             68             57 
#>          Maine       Colorado          Idaho South Carolina     Washington 
#>             54             50             46             41             18 
#>      Wisconsin  West Virginia         Hawaii       New York        Indiana 
#>             17             13             11              2              1

as_fct <- fct_lump_n(repeated_states, 10)

levels(as_fct)
#>  [1] "Alaska"       "Arkansas"     "Georgia"      "Maryland"     "Minnesota"   
#>  [6] "New Mexico"   "Oregon"       "Pennsylvania" "South Dakota" "Texas"       
#> [11] "Other"

as_ordered_fct <- fct_lump_n(repeated_states, 10) |> fct_infreq() |> fct_relevel("Other", after = Inf)

levels(as_ordered_fct)
#>  [1] "Georgia"      "Minnesota"    "Maryland"     "Texas"        "Pennsylvania"
#>  [6] "Arkansas"     "Alaska"       "Oregon"       "New Mexico"   "South Dakota"
#> [11] "Other"

Created on 2024-12-17 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant