From bbf663cb46bb1a5237c274100ec957b196730ae2 Mon Sep 17 00:00:00 2001 From: Nathaniel Blair-Stahn Date: Mon, 24 Apr 2023 23:10:14 -0700 Subject: [PATCH 1/5] draft description of omit_row noise --- docs/source/noise/row_noise.rst | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/docs/source/noise/row_noise.rst b/docs/source/noise/row_noise.rst index d39e883a..788880db 100644 --- a/docs/source/noise/row_noise.rst +++ b/docs/source/noise/row_noise.rst @@ -4,5 +4,27 @@ Row-based Noise =============== +Row-based noise operates on one row of data at a time, for example by +introducing errors to certain cells within a row, or by omitting or duplicating +entire rows. + Omit a row ---------- + +Sometimes an entire record may be missing from a dataset where one would +normally expect to find it. For example, if someone didn't file their taxes on +time, then their tax record for that year would missing. Or perhaps a record is +missing by mistake because of a clerical error. + +This noise type is called :code:`omit_row` in the configuration. It takes one parameter: + +.. list-table:: Parameters to the omit_row noise type + :widths: 1 5 1 + :header-rows: 1 + + * - Parameter + - Description + - Default + * - :code:`row_probability` + - The probability that a row is missing from the dataset. + - 0.01 (1%) From 7b0e802b58e7627e502c556d8a47a68942106aae Mon Sep 17 00:00:00 2001 From: Nathaniel Blair-Stahn Date: Mon, 24 Apr 2023 23:20:06 -0700 Subject: [PATCH 2/5] revise description of omit_row noise --- docs/source/noise/row_noise.rst | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/docs/source/noise/row_noise.rst b/docs/source/noise/row_noise.rst index 788880db..a3d0c6ca 100644 --- a/docs/source/noise/row_noise.rst +++ b/docs/source/noise/row_noise.rst @@ -5,16 +5,16 @@ Row-based Noise =============== Row-based noise operates on one row of data at a time, for example by -introducing errors to certain cells within a row, or by omitting or duplicating +introducing errors in cells within certain rows, or by omitting or duplicating entire rows. Omit a row ---------- Sometimes an entire record may be missing from a dataset where one would -normally expect to find it. For example, if someone didn't file their taxes on -time, then their tax record for that year would missing. Or perhaps a record is -missing by mistake because of a clerical error. +normally expect to find it. For example, a WIC record could be missing by +mistake because of a clerical error, or someone's tax record could be missing +because they didn't file their taxes on time. This noise type is called :code:`omit_row` in the configuration. It takes one parameter: @@ -28,3 +28,6 @@ This noise type is called :code:`omit_row` in the configuration. It takes one pa * - :code:`row_probability` - The probability that a row is missing from the dataset. - 0.01 (1%) + +When applying :code:`omit_row` noise, each row of data is seleceted for omission +independently with probability :code:`row_probability`. From af72e6dccadde8eae65b78bd309b136416f7c820 Mon Sep 17 00:00:00 2001 From: Nathaniel Blair-Stahn Date: Tue, 25 Apr 2023 13:22:09 -0700 Subject: [PATCH 3/5] fix typo Co-authored-by: Zeb Burke-Conte --- docs/source/noise/row_noise.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/noise/row_noise.rst b/docs/source/noise/row_noise.rst index a3d0c6ca..11671e54 100644 --- a/docs/source/noise/row_noise.rst +++ b/docs/source/noise/row_noise.rst @@ -29,5 +29,5 @@ This noise type is called :code:`omit_row` in the configuration. It takes one pa - The probability that a row is missing from the dataset. - 0.01 (1%) -When applying :code:`omit_row` noise, each row of data is seleceted for omission +When applying :code:`omit_row` noise, each row of data is selected for omission independently with probability :code:`row_probability`. From 942f85579dea5a67053e266e473f3f98f52744cf Mon Sep 17 00:00:00 2001 From: Nathaniel Blair-Stahn Date: Tue, 25 Apr 2023 13:22:43 -0700 Subject: [PATCH 4/5] simplify row noise description Co-authored-by: Zeb Burke-Conte --- docs/source/noise/row_noise.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/source/noise/row_noise.rst b/docs/source/noise/row_noise.rst index 11671e54..f1a3aecf 100644 --- a/docs/source/noise/row_noise.rst +++ b/docs/source/noise/row_noise.rst @@ -4,8 +4,7 @@ Row-based Noise =============== -Row-based noise operates on one row of data at a time, for example by -introducing errors in cells within certain rows, or by omitting or duplicating +Row-based noise operates on an entire row of data at a time, for example by omitting or duplicating entire rows. Omit a row From 98459585369230e9707a8e197c5269ef8a2e6cf3 Mon Sep 17 00:00:00 2001 From: Nathaniel Blair-Stahn Date: Tue, 25 Apr 2023 13:29:56 -0700 Subject: [PATCH 5/5] re-word and autoflow --- docs/source/noise/row_noise.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/source/noise/row_noise.rst b/docs/source/noise/row_noise.rst index f1a3aecf..71cd81d7 100644 --- a/docs/source/noise/row_noise.rst +++ b/docs/source/noise/row_noise.rst @@ -4,8 +4,8 @@ Row-based Noise =============== -Row-based noise operates on an entire row of data at a time, for example by omitting or duplicating -entire rows. +Row-based noise operates on one row of data at a time, for example by omitting +or duplicating entire rows. Omit a row ---------- @@ -15,7 +15,8 @@ normally expect to find it. For example, a WIC record could be missing by mistake because of a clerical error, or someone's tax record could be missing because they didn't file their taxes on time. -This noise type is called :code:`omit_row` in the configuration. It takes one parameter: +This noise type is called :code:`omit_row` in the configuration. It takes one +parameter: .. list-table:: Parameters to the omit_row noise type :widths: 1 5 1