Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement targeted omission do_not_respond for census and surveys #158

Merged
merged 7 commits into from
Apr 27, 2023

Conversation

mattkappel
Copy link
Contributor

@mattkappel mattkappel commented Apr 27, 2023

Implement targeted omission do_not_respond for census and surveys

Note: a previous version of this PR aimed at a deleted develop branch is here

Description

  • Category: feature
  • JIRA issue: MIC-3934

Changes

  • Adds do_not_respond noising for census, ACS, and CPS
  • Updates omission tests for the do_not_respond change
  • Adds tests for do_not_respond proportionality and ports the oversampling testing previously done in the omission/omit_rows testing
  • Adds test for incorrect dataset application of do_not_respond
  • Adds test for omit_rows/do_not_respond mutual exclusivity
  • Adds test for correctness of census race/ethnicity, age, sex noise level adjustments

Testing

All tests work.

@mattkappel mattkappel marked this pull request as ready for review April 27, 2023 01:26
@mattkappel
Copy link
Contributor Author

CI breaking atm due to linting. I'll correct in the morning.

Copy link
Collaborator

@rmudambi rmudambi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Feel free to merge once you've addressed the comment about default row noise and the float equality question in the test.

pd.Interval(50, 125),
]

DO_NOT_RESPOND_ADDITIVE_PROBABILITY_BY_SEX_AGE: Dict[str, pd.Series] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it.

src/pseudopeople/constants/data_values.py Outdated Show resolved Hide resolved
src/pseudopeople/noise_functions.py Outdated Show resolved Hide resolved
src/pseudopeople/noise_functions.py Show resolved Hide resolved
src/pseudopeople/noise_functions.py Show resolved Hide resolved
src/pseudopeople/schema_entities.py Show resolved Hide resolved
tests/unit/test_row_noise.py Show resolved Hide resolved
tests/unit/test_row_noise.py Outdated Show resolved Hide resolved
@mattkappel mattkappel force-pushed the feature/mic-3934-redux-develop branch from 9ca9549 to fefccbb Compare April 27, 2023 18:55
@@ -129,7 +129,8 @@ def test_noise_order(mocker, dummy_data, dummy_config_noise_numbers):

call_order = [x[0] for x in mock.mock_calls if not x[0].startswith("__")]
expected_call_order = [
"omission",
# "omit_rows", # Census doesn't use omit_rows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a FIXME to update this to another dataset that does have omit_rows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already have a FIXME to do something other than census. Do you have a ticket for that? Otherwise, we ought to create one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't look like one existed, so I created MIC-4037.

@mattkappel mattkappel merged commit 29dab49 into develop Apr 27, 2023
@mattkappel mattkappel deleted the feature/mic-3934-redux-develop branch April 27, 2023 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants