Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear warnings and errors generated when setting levels for a factor generated from a character vector #314

Closed
wtimmerman-fitp opened this issue Aug 9, 2022 · 3 comments

Comments

@wtimmerman-fitp
Copy link

wtimmerman-fitp commented Aug 9, 2022

When I use fct_relevel with the levels argument, I receive a warning that does not clearly indicate what is going wrong. Similarly, when I use the levels argument in forcats::as_factor()'s, (on the assumptions that arguments in .../ellipsis will be passed on to methods), I receive an error "Arguments in ... must be used". Both of these are unexpected results for me based on my understanding of the function help text and base::factor().

For background, my intention is to convert a character column into a factor column using a pre-specified list of levels (the pre-specified list is somewhat important as a check and consistency for reasons that I won't get into here). I have reviewed the forcats issues and don't see an exact match for this problem:

  • Using base:factor(), I can pass the vector of levels to the levels argument; this is fine, but it is not noisy enough if the levels provided do not match the character column I am mutating into a factor.
  • Using forcats::as_factor(), when I pass the levels argument I receive the error "Arguments in ... must be used." I am not clear if I am misusing the function.
  • Using forcats::fct_relevel(), I receive the warning "Outer names are only allowed for unnamed scalar atomic inputs". This comes from vctrs, and I also see it referenced in the fct_relevel() help, but it doesn't seem to apply in the reprex I've generated below.

My questions are:

  • Should these forcats() functions be generating different/more-specific warnings?
  • Should these forcats() functions behave differently when passed the levels argument?
  • Should I be using these functions differently (or a different function altogether) given my use case?

Reprex

library(tidyverse)

mtcars2 <-
  mtcars %>% 
  tibble::rownames_to_column(var = "make_model") %>% 
  dplyr::filter(
    dplyr::row_number() <= 5
  )

use_levels <-
  mtcars2 %>% 
  dplyr::pull(make_model) 

# this works as expected, since the provided levels will by definition match the values in the make_model column.
mtcars2_factor <-
  mtcars2 %>% 
  dplyr::mutate(
    make_model = base::factor(
      make_model,
      levels = use_levels
    )
  )

# I don't understand why this is an error based on the as_factor() help.
mtcars2_as_factor <-
  mtcars2 %>% 
  dplyr::mutate(
    make_model = forcats::as_factor(
      make_model,
      levels = use_levels
    )
  )
#> Error in `dplyr::mutate()`:
#> ! Problem while computing `make_model = forcats::as_factor(make_model,
#>   levels = use_levels)`.
#> Caused by error:
#> ! Arguments in `...` must be used.
#> x Problematic argument:
#> * levels = use_levels

# I don't understand why this generates this warning since use_levels does not have names
mtcars2_fct_relevel <-
  mtcars2 %>% 
  dplyr::mutate(
    make_model = forcats::fct_relevel(
      make_model,
      levels = use_levels
    )
  )
#> Warning: Outer names are only allowed for unnamed scalar atomic inputs

# when i modify use_levels to have a value not present in the column, more challenges arise.
use_levels_mod <-
  c(use_levels, "Other Car")

# base::factor is not noisy enough that there are factor levels not present in the data.
mtcars2_mod_factor <-
  mtcars2 %>% 
  dplyr::mutate(
    make_model = base::factor(
      make_model,
      levels = use_levels_mod
    )
  )

# as_factor continus to error
mtcars2_mod_as_factor <-
  mtcars2 %>% 
  dplyr::mutate(
    make_model = forcats::as_factor(
      make_model,
      levels = use_levels_mod
    )
  )
#> Error in `dplyr::mutate()`:
#> ! Problem while computing `make_model = forcats::as_factor(make_model,
#>   levels = use_levels_mod)`.
#> Caused by error:
#> ! Arguments in `...` must be used.
#> x Problematic argument:
#> * levels = use_levels_mod

# fct_relevel generates an expected warning, but still has the 
# original warning that makes little sense in this case.

mtcars2_mod_fct_relevel <-
  mtcars2 %>% 
  dplyr::mutate(
    make_model = forcats::fct_relevel(
      make_model,
      levels = use_levels_mod
    )
  )
#> Warning: Outer names are only allowed for unnamed scalar atomic inputs
#> Warning: Unknown levels in `f`: Other Car

Created on 2022-08-09 by the reprex package (v2.0.1)

Session info
sessionInfo()
#> R version 4.0.5 (2021-03-31)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19043)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.9     purrr_0.3.4    
#> [5] readr_2.1.2     tidyr_1.2.0     tibble_3.1.8    ggplot2_3.3.6  
#> [9] tidyverse_1.3.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.2    xfun_0.31           haven_2.5.0        
#>  [4] gargle_1.2.0        colorspace_2.0-3    vctrs_0.4.1        
#>  [7] generics_0.1.3      htmltools_0.5.3     yaml_2.3.5         
#> [10] utf8_1.2.2          rlang_1.0.4         pillar_1.8.0       
#> [13] glue_1.6.2          withr_2.5.0         DBI_1.1.3          
#> [16] dbplyr_2.2.1        readxl_1.4.0        modelr_0.1.8       
#> [19] lifecycle_1.0.1     munsell_0.5.0       gtable_0.3.0       
#> [22] cellranger_1.1.0    rvest_1.0.2         evaluate_0.15      
#> [25] knitr_1.39          tzdb_0.3.0          fastmap_1.1.0      
#> [28] fansi_1.0.3         highr_0.9           broom_1.0.0        
#> [31] backports_1.4.1     scales_1.2.0        googlesheets4_1.0.0
#> [34] jsonlite_1.8.0      fs_1.5.2            hms_1.1.1          
#> [37] digest_0.6.29       stringi_1.7.8       grid_4.0.5         
#> [40] cli_3.3.0           tools_4.0.5         magrittr_2.0.3     
#> [43] crayon_1.5.1        pkgconfig_2.0.3     ellipsis_0.3.2     
#> [46] xml2_1.3.3          reprex_2.0.1        googledrive_2.0.0  
#> [49] lubridate_1.8.0     assertthat_0.2.1    rmarkdown_2.14     
#> [52] httr_1.4.3          rstudioapi_0.13     R6_2.5.1           
#> [55] compiler_4.0.5
@jennybc
Copy link
Member

jennybc commented Aug 9, 2022

Based on a quick read, I think you might be interested in fct()? More in #299.

@wtimmerman-fitp
Copy link
Author

Oh, this is perfect! Thank you for the pointer! I think this will solve my issue. level named argument is there, no errors or warnings if an additional level is listed but not in data, errors (unlike base::factor) if one of the supplied levels is not in the data.

I'll close the issue and look forward to fct() getting into a future release.

(example below if anyone curious).

#setup ----
library(tidyverse)

fct <- function(x = character(), levels = NULL, na = character()) {
  if (!is.character(x)) {
    cli::cli_abort("{.arg x} must be a character vector")
  }
  if (!is.character(na)) {
    cli::cli_abort("{.arg na} must be a character vector")
  }
  
  x[x %in% na] <- NA
  
  if (is.null(levels)) {
    levels <- unique(x)
  } else if (!is.character(levels)) {
    abort("`{.arg levels} must be a character vector")
  }
  
  invalid <- setdiff(x, c(levels, NA))
  
  if (length(invalid) > 0 ) {
    cli::cli_abort(c(
      "Values of {.arg x} must be members of {.arg levels}", 
      i = "Invalid value{?s}: {.str {invalid}}"
    ))
  }
  factor(x, levels = levels, exclude = NULL)
}

mtcars2 <-
  mtcars %>% 
  tibble::rownames_to_column(var = "make_model") %>% 
  dplyr::filter(
    dplyr::row_number() <= 5
  )

# Match levels----
match_levels <-
  mtcars2 %>% 
  dplyr::pull(make_model) 

mtcars2_factor <-
  mtcars2 %>% 
  dplyr::mutate(
    make_model = base::factor(
      make_model,
      levels = match_levels
    )
  )

mtcars2_fct <-
  mtcars2 %>% 
  dplyr::mutate(
    make_model = fct(
      make_model,
      levels = match_levels
    )
  )

# Add Levels ----
add_levels <-
  c(match_levels, "Other Car")

mtcars2_add_factor <-
  mtcars2 %>% 
  dplyr::mutate(
    make_model = base::factor(
      make_model,
      levels = add_levels
    )
  )

mtcars2_add_fct <-
  mtcars2 %>% 
  dplyr::mutate(
    make_model = fct(
      make_model,
      levels = add_levels
    )
  )

levels(mtcars2_add_fct$make_model)
#> [1] "Mazda RX4"         "Mazda RX4 Wag"     "Datsun 710"       
#> [4] "Hornet 4 Drive"    "Hornet Sportabout" "Other Car"

# Miss Levels ----
miss_levels <-
  match_levels[-1]

mtcars2_miss_factor <-
  mtcars2 %>% 
  dplyr::mutate(
    make_model = base::factor(
      make_model,
      levels = miss_levels
    )
  )

mtcars2_miss_fct <-
  mtcars2 %>% 
  dplyr::mutate(
    make_model = fct(
      make_model,
      levels = miss_levels
    )
  )
#> Error in `dplyr::mutate()`:
#> ! Problem while computing `make_model = fct(make_model, levels =
#>   miss_levels)`.
#> Caused by error in `fct()`:
#> ! Values of `x` must be members of `levels`
#> i Invalid value: "Mazda RX4"

Created on 2022-08-09 by the reprex package (v2.0.1)

@wtimmerman-fitp
Copy link
Author

Also, if anyone runs into the same warning I got with fct_relevel (Warning: Outer names are only allowed for unnamed scalar atomic inputs), it's because you can't use the levels argument for that function; just pass the vector object of level names (in this case, use_levels) into the ellipsis on its own like:

mtcars2_fct_relevel <-
  mtcars2 %>% 
  dplyr::mutate(
    make_model = forcats::fct_relevel(
      make_model,
      use_levels
    )
  )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants