-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement typographic noise function #19
implement typographic noise function #19
Conversation
"""Abie's implementation of typographical noising""" | ||
err = "" | ||
i = 0 | ||
while i < len(truth): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where a followup PR will try and implement numba or otherwise refactor.
) | ||
|
||
assert noised_data.equals(noised_data_same_seed) | ||
assert not noised_data.equals(noised_data_different_seed) | ||
assert not data.equals(noised_data) | ||
# TODO: Confirm correct columns exist once the interface functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this deletion from Jim's open PR since we decided this is all too complicated given the number of columns and functions to be implemented.
|
||
|
||
# TODO: refactor this into its own test parameterized by noise functions | ||
def _validate_seed_and_noise_data(func, column, config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A followup PR will move this method out of the existing tests and do these checks as their own tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! The only concern I have with this is that the code that transformed the keyboard layout CSV -> qwerty_errors.yaml isn't here.
Title: Implement typographnic noising
Description
This is one of probably a few PRs to implement typographic noising.
This one includes the function as well as some unit tests.
Followup PRs may include:
loop over every character in every string of potentially-noised rows)
Testing
This PR includes a (very slow) unit test that is passing
To discuss: how to add an integration test?