-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sample data and include in tests #37
Merged
mattkappel
merged 4 commits into
develop
from
feature/mic-3884-incorporate-sample-dataset
Apr 7, 2023
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,60 +1,46 @@ | ||
from pathlib import Path | ||
from typing import Union | ||
from typing import Callable, Union | ||
|
||
import pandas as pd | ||
import pytest | ||
|
||
from pseudopeople.interface import generate_decennial_census | ||
|
||
|
||
# TODO: possibly parametrize Forms? | ||
def test_generate_decennial_census( | ||
decennial_census_data_path: Union[Path, str], user_config_path: Union[Path, str] | ||
): | ||
data = pd.read_hdf(decennial_census_data_path) | ||
|
||
# TODO: Refactor this check into a separate test | ||
noised_data = generate_decennial_census( | ||
source=decennial_census_data_path, seed=0, configuration=user_config_path | ||
) | ||
noised_data_same_seed = generate_decennial_census( | ||
source=decennial_census_data_path, seed=0, configuration=user_config_path | ||
) | ||
noised_data_different_seed = generate_decennial_census( | ||
source=decennial_census_data_path, seed=1, configuration=user_config_path | ||
) | ||
from pseudopeople.constants.paths import ( | ||
SAMPLE_AMERICAN_COMMUNITIES_SURVEY, | ||
SAMPLE_CURRENT_POPULATION_SURVEY, | ||
SAMPLE_DECENNIAL_CENSUS, | ||
SAMPLE_SOCIAL_SECURITY, | ||
SAMPLE_TAXES_W2_AND_1099, | ||
SAMPLE_WOMEN_INFANTS_AND_CHILDREN, | ||
) | ||
from pseudopeople.interface import ( | ||
generate_american_communities_survey, | ||
generate_current_population_survey, | ||
generate_decennial_census, | ||
generate_social_security, | ||
generate_taxes_w2_and_1099, | ||
generate_women_infants_and_children, | ||
) | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"data_path, noising_function", | ||
[ | ||
(SAMPLE_DECENNIAL_CENSUS, generate_decennial_census), | ||
(SAMPLE_AMERICAN_COMMUNITIES_SURVEY, generate_american_communities_survey), | ||
(SAMPLE_CURRENT_POPULATION_SURVEY, generate_current_population_survey), | ||
(SAMPLE_SOCIAL_SECURITY, generate_social_security), | ||
(SAMPLE_TAXES_W2_AND_1099, generate_taxes_w2_and_1099), | ||
(SAMPLE_WOMEN_INFANTS_AND_CHILDREN, generate_women_infants_and_children), | ||
], | ||
) | ||
def test_generate_form(data_path: Union[Path, str], noising_function: Callable): | ||
data = pd.DataFrame(pd.read_hdf(data_path)) | ||
|
||
noised_data = noising_function(source=data.copy(), seed=0) | ||
noised_data_same_seed = noising_function(source=data.copy(), seed=0) | ||
noised_data_different_seed = noising_function(source=data.copy(), seed=1) | ||
|
||
assert not data.equals(noised_data) | ||
assert noised_data.equals(noised_data_same_seed) | ||
assert not noised_data.equals(noised_data_different_seed) | ||
assert not data.equals(noised_data) | ||
assert set(noised_data.columns) == set(data.columns) | ||
|
||
|
||
@pytest.mark.skip(reason="TODO") | ||
def test_generate_acs(): | ||
pass | ||
|
||
|
||
@pytest.mark.skip(reason="TODO") | ||
def test_generate_cps(): | ||
pass | ||
|
||
|
||
@pytest.mark.skip(reason="TODO") | ||
def test_generate_wic(): | ||
pass | ||
|
||
|
||
@pytest.mark.skip(reason="TODO") | ||
def test_generate_ssa(): | ||
pass | ||
|
||
|
||
@pytest.mark.skip(reason="TODO") | ||
def test_generate_tax_w2_1099(): | ||
pass | ||
|
||
|
||
@pytest.mark.skip(reason="TODO") | ||
def test_generate_tax_1040(): | ||
pass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,7 +8,14 @@ | |
from vivarium.config_tree import ConfigTree | ||
|
||
from pseudopeople.entity_types import ColumnNoiseType | ||
from pseudopeople.interface import generate_decennial_census | ||
from pseudopeople.interface import ( | ||
generate_american_communities_survey, | ||
generate_current_population_survey, | ||
generate_decennial_census, | ||
generate_social_security, | ||
generate_taxes_w2_and_1099, | ||
generate_women_infants_and_children, | ||
) | ||
from pseudopeople.noise import noise_form | ||
from pseudopeople.noise_entities import NOISE_TYPES | ||
from pseudopeople.schema_entities import Form | ||
|
@@ -156,11 +163,11 @@ def test_columns_noised(dummy_data): | |
"func, form", | ||
[ | ||
(generate_decennial_census, Form.CENSUS), | ||
("todo", Form.ACS), | ||
("todo", Form.CPS), | ||
("todo", Form.WIC), | ||
("todo", Form.SSA), | ||
("todo", Form.TAX_W2_1099), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You knocked out so many TODOs with one fell swoop! |
||
(generate_american_communities_survey, Form.ACS), | ||
(generate_current_population_survey, Form.CPS), | ||
(generate_women_infants_and_children, Form.WIC), | ||
(generate_social_security, Form.SSA), | ||
(generate_taxes_w2_and_1099, Form.TAX_W2_1099), | ||
("todo", Form.TAX_1040), | ||
], | ||
) | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these coming in as now if not strings? I'm concerned if they're coming in as ints again then we will be back to losing preceding 0s (thought I guess the check below would catch that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same dataset has all the zipcodes as 90210, which was being interpreted as an int.