diff --git a/docs/source/concepts/configuration.rst b/docs/source/concepts/configuration.rst deleted file mode 100644 index 0a9002dc..00000000 --- a/docs/source/concepts/configuration.rst +++ /dev/null @@ -1,16 +0,0 @@ -.. _configuration_concept: - -======================== -The Configuration System -======================== - -.. contents:: - :depth: 2 - :local: - :backlinks: none - -Outline -------- - - Configuration is hierarchical - - Configuration sources and priorities - - Specifying defaults. \ No newline at end of file diff --git a/docs/source/concepts/datasets.rst b/docs/source/concepts/datasets.rst deleted file mode 100644 index 64bc6107..00000000 --- a/docs/source/concepts/datasets.rst +++ /dev/null @@ -1,33 +0,0 @@ -.. _datasets_concept: - -============== - Datasets -============== - -.. contents:: - :depth: 2 - :local: - :backlinks: none - - - - -What is a dataset? ------------------- - -A dataset in the Pseudopeople framework contains un-noised simulated data -representing specific real-life data, eg a census survey or tax document. -The types of datasets that are compatible with the Pseudopeople framework include: - -.. list-table:: **Types of Datasets** - :header-rows: 1 - :widths: 20 - - * - Name - * - | Decennial census - * - | American communities survey - * - | Current population survey - * - | Women, infrants, and children survey - * - | Social security - * - | Tax W2 and 1099 forms - * - | Tax 1040 form diff --git a/docs/source/concepts/index.rst b/docs/source/concepts/index.rst deleted file mode 100644 index f527da18..00000000 --- a/docs/source/concepts/index.rst +++ /dev/null @@ -1,13 +0,0 @@ -.. _concepts_main: - -======== -Concepts -======== -Here we cover several core conceptual topics related to working with -the Pseudopeople framework. - -.. toctree:: - :glob: - :maxdepth: 1 - - * diff --git a/docs/source/concepts/noise_functions.rst b/docs/source/concepts/noise_functions.rst deleted file mode 100644 index a5eda2f4..00000000 --- a/docs/source/concepts/noise_functions.rst +++ /dev/null @@ -1,42 +0,0 @@ -.. _noise_functions_concept: - -================= - Noise Functions -================= - -.. contents:: - :depth: 2 - :local: - :backlinks: none - - - - -What is a noise function? -------------------------- - -A noise function is ultimately where the configuration (add link) provided is -applied to the raw data which is then noised or altered and returned to the user -in a state where real world data error have been added to each dataset (add link). -Noise functions will be applied to datasets by column or by row. There are -several noise functions that are applied to the raw data which include: - -.. list-table:: **Noise Functions** - :header-rows: 1 - :widths: 20 - - * - Name - * - | Omission - * - | Duplications - * - | Missing data - * - | Incorrect selection - * - | Copy from within household - * - | Month and day swaps (applies to dates only) - * - | Zip Code Miswriting (applies to Zip Code only) - * - | Age Miswriting (applies to age only) - * - | Numeric miswriting - * - | Nicknames - * - | Fake names - * - | Phonetic errors - * - | OCR (optical character recognition) - * - | Typographic diff --git a/docs/source/index.rst b/docs/source/index.rst index 5beaf57e..98181ad8 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -26,7 +26,7 @@ Introduction The University of Washington IHME Simulation Science Team is excited to introduce pseudopeople, the Python package that simplifies Entity Resolution (ER) research and development. This package generates large-scale, simulated population data according to specifications by the user, to replicate a range of complexities of real applications of probabilistic record linkage software. With sensitive data often required for ER, accessing and testing new methods and software has been a challenge - until now. -Our innovative approach creates realistic, simulated data including name, address, and date of birth, without compromising privacy. +Our innovative approach creates realistic, simulated data including name, address, and date of birth, without compromising privacy. Our work builds on the success of previous data synthesis projects, such as `FEBRL `_, @@ -35,7 +35,7 @@ and `SOG `_ to incorporate real, publicly-accessible data about the US population. This allows us to model realistic household and family structures at scale, with relevant geographies. We have created a simulation of the US population, including names and addresses, with defined types of data collection (e.g., simulating decennial censuses, surveys, taxes, and other administrative data). -By creating realistic, but simulated, data which includes these attributes, we can make ER research and development easier for ourselves and others. +By creating realistic, but simulated, data which includes these attributes, we can make ER research and development easier for ourselves and others. Quickstart ---------- @@ -120,6 +120,5 @@ Now, see how many your record linkage method can find -- without access to the t noise/index configuration/index tutorials/index - concepts/index api_reference/index glossary