-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add get_default_configuration and defaults yaml #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we not create a unit-test from the "testing" that is done?
@ramittal there's a separate ticket for building out the testing framework https://jira.ihme.washington.edu/browse/MIC-3862 |
|
||
|
||
def get_default_configuration() -> ConfigTree: | ||
import pseudopeople |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why import in this scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I just need it for the location of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a paths.py that we could add the yaml location to at some point.
noising_configuration = ConfigTree(layers=default_config_layers) | ||
BASE_DIR = Path(pseudopeople.__file__).resolve().parent | ||
yaml_path = BASE_DIR / "default_configuration.yaml" | ||
noising_configuration.update(yaml_path, layer="base") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this update
method adds the new keys from the yaml to the base layer? Does it behave just like a dict.update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I believe that is what it is doing according to the docs. I think the important thing to note is you can update at different levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; more-or-less. You have the layers, which matter in deciding the "true", desired configuration.
zipcode_miswriting: | ||
zipcode: | ||
row_noise_level: 0.01 | ||
zipcode_miswriting: [0.04, 0.04, 0.2, 0.36, 0.36] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these lists working? Is this the chance of a miswrite per digit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is defined in the research docs. First two and last two digits have the same probabilities, respectively.
@@ -0,0 +1,200 @@ | |||
# Default noising configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very, very long. Do we want a default config per form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RT discussion needed.
While the testing framework can come in later, just moving the ad-hoc test code to a unit-test should be trivial and get us code-coverage from get-go and ensure any un-intended changes are caught. |
@@ -44,6 +47,21 @@ class __NoiseTypes(NamedTuple): | |||
PHONETIC: ColumnNoiseType = ColumnNoiseType( | |||
"phonetic", noise_functions.generate_phonetic_errors | |||
) | |||
MISSING_DATA: ColumnNoiseType = ColumnNoiseType( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did this get added in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I need to have agreement on strings that are in the YAML.
Add get_default_configuration and defaults yaml
Description
Changes
get_default_configuration
utility function that returns a ConfigTree objectTesting
Imported the utility, executed, and got an expected ConfigTree:
gave output: