-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/sbachmei/test column specific noising #131
Feature/sbachmei/test column specific noising #131
Conversation
…/sbachmei/test-column-specific-noising
…/sbachmei/test-column-specific-noising
} | ||
|
||
# Set row-level noise to zero since we're just testing columns here. | ||
config[dataset.name][Keys.ROW_NOISE] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If row noise is not present, don't we have guarantees that the indexes will match in the case where we're using the sample data?
Why aren't we testing row noise in this test? Do we have another test that is testing that row noise is applied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - probably? I added this because the targeted omission stuff was failing. Haven't thought through yet a better way to do it with that.
As for row noise - that just seems like a different type of test and out of scope for this ticket. Maybe I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing w/ Rajan, we will go ahead with this approach and then immediately follow it up w/ a bugfix task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, but create a ticket to figure out why the row noise needs to be set to 0, and fix that bug next.
for col in noised_data.columns: | ||
expected_dtype = [c.dtype_name for c in COLUMNS if c.name == col][0] | ||
# Check each column. We set the index for each dataset to be unique | ||
# identifiers b/c the original index gets reset after noising. Note that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the index get reset? At the dataset level? Not in the noise function for columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the helpful comments in the code. Helped make things a lot clearer.
Title: Test expected columns are noised
Description
Currently for integration testing we are only really ensuring that the noised
dataset is at all different than the raw dataset. This PR checks
column-by-column that the expected cols are different and those that
are output but not noised are the same.
Testing
tests pass