Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document omit_row noise type #143

Merged
merged 6 commits into from
Apr 25, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions docs/source/noise/row_noise.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,30 @@
Row-based Noise
===============

Row-based noise operates on one row of data at a time, for example by omitting
or duplicating entire rows.

Omit a row
----------

Sometimes an entire record may be missing from a dataset where one would
normally expect to find it. For example, a WIC record could be missing by
mistake because of a clerical error, or someone's tax record could be missing
because they didn't file their taxes on time.

This noise type is called :code:`omit_row` in the configuration. It takes one
parameter:

.. list-table:: Parameters to the omit_row noise type
:widths: 1 5 1
:header-rows: 1

* - Parameter
- Description
- Default
* - :code:`row_probability`
- The probability that a row is missing from the dataset.
- 0.01 (1%)

When applying :code:`omit_row` noise, each row of data is selected for omission
independently with probability :code:`row_probability`.