-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How lenient should labelled be when combining different labels that map to the same value? #667
Comments
pivot_longer()
This isn't a @hadley it seems like this is expected (sort of like what we do with time zones), but I could also see where haven could be more strict and not allow you to combine vectors that have different labels that map to the same value (i.e. library(haven)
age <- labelled(
1:6,
labels = c("18-24"=1, "25-34"=2,"35-44"=3, "45-54"=4,"55-64"=5,"65+"=6),
label = "Age"
)
gender <- labelled(
1:2,
labels = c("Male" = 1, "Female"=2),
label = "Gender"
)
age
#> <labelled<integer>[6]>: Age
#> [1] 1 2 3 4 5 6
#>
#> Labels:
#> value label
#> 1 18-24
#> 2 25-34
#> 3 35-44
#> 4 45-54
#> 5 55-64
#> 6 65+
gender
#> <labelled<integer>[2]>: Gender
#> [1] 1 2
#>
#> Labels:
#> value label
#> 1 Male
#> 2 Female
# Uses labels of LHS then RHS
c(age, gender)
#> <labelled<integer>[8]>: Age
#> [1] 1 2 3 4 5 6 1 2
#>
#> Labels:
#> value label
#> 1 18-24
#> 2 25-34
#> 3 35-44
#> 4 45-54
#> 5 55-64
#> 6 65+
c(gender, age)
#> <labelled<integer>[8]>: Gender
#> [1] 1 2 1 2 3 4 5 6
#>
#> Labels:
#> value label
#> 1 Male
#> 2 Female
#> 3 35-44
#> 4 45-54
#> 5 55-64
#> 6 65+ |
pivot_longer()
Thanks @DavisVaughan! Although not ideal this is by design, since there's no easy way to reconcile mismatched labels and this is a path of least resistance (i.e. least likely to throw errors when combining vectors while still supporting common operations). See #543 for a brief discussion and a bit of context around the current more permissive stance. @manhnguyen48 as noted in the conversion semantics vignette and a few other spots the labelled class is mostly intended as an intermediate class between stats packages and R, so the correct approach is to convert to factors as you've done above or remove labels using Agreed that a warning would be good when two labelled vectors with conflicting labels are combined. |
I'm thinking something like this: library(haven)
library(labelled)
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
example_data <- tibble(
serial = 1:100,
age = labelled_spss(
sample(1:6, size = 100, replace = TRUE),
c(
"18-24" = 1,
"25-34" = 2,
"35-44" = 3,
"45-54" = 4,
"55-64" = 5,
"65+" = 6,
"Unknown" = 99
),
na_values = 99
),
gender = labelled(
sample(1:3, size = 100, replace = TRUE),
c("Male" = 1, "Female" = 2, "Other" = 3)
),
q1 = labelled(
sample(1:2, size = 100, replace = TRUE),
c("Yes" = 1, "No" = 2)
)
)
pivot_longer(example_data, c(gender, age, q1)) %>%
count(name, value)
#> Warning: `gender` and `age` have conflicting value labels.
#> ℹ Labels for these values will be taken from `gender`
#> x Values: 1, 2, 3
#> Warning: `gender` and `q1` have conflicting value labels.
#> ℹ Labels for these values will be taken from `gender`
#> x Values: 1, 2
#> # A tibble: 11 × 3
#> name value n
#> <chr> <int+lbl> <int>
#> 1 age 1 [Male] 16
#> 2 age 2 [Female] 12
#> 3 age 3 [Other] 17
#> 4 age 4 [45-54] 17
#> 5 age 5 [55-64] 21
#> 6 age 6 [65+] 17
#> 7 gender 1 [Male] 31
#> 8 gender 2 [Female] 39
#> 9 gender 3 [Other] 30
#> 10 q1 1 [Male] 46
#> 11 q1 2 [Female] 54 Created on 2022-03-24 by the reprex package (v2.0.1) @hadley, any thoughts on the warning message? Too verbose? |
@gorcha that warning looks great to me! |
It seems
pivot_longer
would lose the value labels (produced bylabelled
). This can be avoided if transformed to factors first. Is this expected behaviour? It'd be nice to at least have some warnings.Brief description of the problem
Created on 2022-03-19 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: