-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add fuzzy matching #37
Comments
One issue with this would be that it would create a pipeline that is not reproducible and can't be run inside a rmarkdown document (without author input). Given that we already allow use of a custom dictionary could we instead have a function like I imagine a pipeline like
Also, the selection options should have 6. No, replace with inputted value. This would be useful for novel responses like apogender that should be added to the dictionary without requiring the user to retype. |
Except that also doesn't run in a RMD. The other hang-up is that this is going to have scalability problems. Taking inputs for 12 fuzzy matches is fine. Taking it for 120 is going to be a PITA |
Why wouldn't it run in an RMD? In that workflow you would use the message text from Step 2 to recreate the custom dictionary programmatically. |
Not in one pass I mean. Yes, once the dictionary is created, it's created, but there's still interactive built into that pipeline. |
Yes, I can't see much way around that without simply skipping validation of fuzzy matches which seems dangerous |
There's a persistent issue where people provide expansive and idiosyncratic responses (e.g. "I'm sexually female") that can be reasonably classified by a human user, but are difficult to accommodate in the dictionaries method as it stands.
There are a number of suggestions for how we might resolve this (e.g. grep), but these of course have potential issues with unknown future inputs. Emily also likes how the current process gives you a transparent log of how recoding happens which becomes trickier with fuzzy matching.
This is a summary of the proposed (by Emily and I) implementation of any fuzzy matching.
Fuzzy matching should:
fuzzy = TRUE
),The core function arguments would default to:
gender_recode <- function(gender = gender, dictionary = gendercoder::broad, fill = FALSE, match = "exact")
And implementation would be:
Keen to get input on alternatives and implementations.
The text was updated successfully, but these errors were encountered: