Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/sbachmei/test column specific noising #131

Merged

Conversation

stevebachmeier
Copy link
Contributor

Title: Test expected columns are noised

Description

Currently for integration testing we are only really ensuring that the noised
dataset is at all different than the raw dataset. This PR checks
column-by-column that the expected cols are different and those that
are output but not noised are the same.

Testing

tests pass

src/pseudopeople/interface.py Outdated Show resolved Hide resolved
tests/integration/test_interface.py Outdated Show resolved Hide resolved
tests/integration/test_interface.py Outdated Show resolved Hide resolved
}

# Set row-level noise to zero since we're just testing columns here.
config[dataset.name][Keys.ROW_NOISE] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If row noise is not present, don't we have guarantees that the indexes will match in the case where we're using the sample data?

Why aren't we testing row noise in this test? Do we have another test that is testing that row noise is applied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - probably? I added this because the targeted omission stuff was failing. Haven't thought through yet a better way to do it with that.

As for row noise - that just seems like a different type of test and out of scope for this ticket. Maybe I'm wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing w/ Rajan, we will go ahead with this approach and then immediately follow it up w/ a bugfix task

Copy link
Collaborator

@rmudambi rmudambi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but create a ticket to figure out why the row noise needs to be set to 0, and fix that bug next.

for col in noised_data.columns:
expected_dtype = [c.dtype_name for c in COLUMNS if c.name == col][0]
# Check each column. We set the index for each dataset to be unique
# identifiers b/c the original index gets reset after noising. Note that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the index get reset? At the dataset level? Not in the noise function for columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@albrja albrja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the helpful comments in the code. Helped make things a lot clearer.

@stevebachmeier stevebachmeier merged commit f736186 into develop May 1, 2023
@stevebachmeier stevebachmeier deleted the feature/sbachmei/test-column-specific-noising branch May 1, 2023 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants