Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Add columns to decennial census and ACS #229

Merged
merged 12 commits into from
Jul 21, 2023

Conversation

NathanielBlairStahn
Copy link
Contributor

@NathanielBlairStahn NathanielBlairStahn commented Jul 19, 2023

DOC: Add columns to decennial census and ACS

Description

  • Document adding "housing type" column to decennial census and ACS datasets
  • Document adding "relationship to reference person" column to ACS dataset
  • Edit noise table to include the new columns
  • Specify desired category names for "housing type" and "relationship to reference person"

This is intended to be the official documentation for the engineers to implement these changes, as well as being the user-facing documentation.

The diff includes a bunch of whitespace deletions from my editor, so you may want to hide those.

Testing

Built docs locally

@NathanielBlairStahn NathanielBlairStahn added the documentation Improvements or additions to documentation label Jul 19, 2023
@NathanielBlairStahn NathanielBlairStahn requested review from a team, zmbc and pletale as code owners July 19, 2023 22:24
* - Relationship to reference person
- :code:`relationship_to_reference_person`
- :code:`relationship_to_reference_person`
- Possible values for this indicator include:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the "relationship to reference person" column for the decennial census is missing some possible values, namely: "Opposite-sex spouse"; "Opposite-sex unmarried partner"; "Same-sex spouse"; "Same-sex unmarried partner"

I did not update these values in this pull request in case we want to make this correction as a hotfix instead. So if this PR gets merged to develop, the documentation for census and ACS will not mach until we implement the hotfix and merge it to develop. Alternatively, I could just make the change here, and we could not worry about making the correction in a hotfix. @aflaxman any preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it simple, just make the changes here.

institutional". The types of noninstitutional group quarters are
"College", "Military", and "Other noninstitutional".
* - Relationship to reference person
- :code:`relationship_to_reference_person`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on slack, our data column in the decennial census is called relation_to_reference_person instead. Should I change this to match, or should we change the data to match what's in the docs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards relationship_to_reference_person at this moment.

@NathanielBlairStahn NathanielBlairStahn changed the title Census acs columns DOC: Add columns to decennial census and ACS Jul 19, 2023
Comment on lines 179 to 184
"Reference person"; "Opposite-sex spouse"; "Opposite-sex unmarried
partner"; "Same-sex spouse"; "Same-sex unmarried partner"; "Biological
child"; "Adopted child"; "Stepchild"; "Sibling"; "Parent"; "Grandchild";
"Parent-in-law"; "Child-in-law"; "Other relative"; "Roommate"; "Foster
child"; "Other nonrelative"; "Institutionalized group quarters
population"; and "Noninstitutionalized group quarters population".
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the full category names such as "Opposite-sex unmarried partner" and "Noninstitutionalized group quarters population" instead of our abbreviations such as "Opp-sex partner" and "Noninstitutionalized GQ pop". I think that's better -- do other people have a preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, since the strings are long anyway we might as well have them unabbreviated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're changing the "GQ pop" ones anyway, did we decide whether we like the word "population"? I feel like "group quarters person" is more natural for this column but don't feel strongly.

docs/source/datasets/index.rst Outdated Show resolved Hide resolved
docs/source/datasets/index.rst Outdated Show resolved Hide resolved
Comment on lines 179 to 184
"Reference person"; "Opposite-sex spouse"; "Opposite-sex unmarried
partner"; "Same-sex spouse"; "Same-sex unmarried partner"; "Biological
child"; "Adopted child"; "Stepchild"; "Sibling"; "Parent"; "Grandchild";
"Parent-in-law"; "Child-in-law"; "Other relative"; "Roommate"; "Foster
child"; "Other nonrelative"; "Institutionalized group quarters
population"; and "Noninstitutionalized group quarters population".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're changing the "GQ pop" ones anyway, did we decide whether we like the word "population"? I feel like "group quarters person" is more natural for this column but don't feel strongly.

@NathanielBlairStahn
Copy link
Contributor Author

NathanielBlairStahn commented Jul 19, 2023

If we're changing the "GQ pop" ones anyway, did we decide whether we like the word "population"? I feel like "group quarters person" is more natural for this column but don't feel strongly.

I got the names from pp. 39-40 in this reference, with the only change being to remove the gendered language:
https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMS_Data_Dictionary_2016-2020.pdf

If we wanted to make other changes I guess we could, but I figured the simplest thing was to use what's there.

@zmbc
Copy link
Collaborator

zmbc commented Jul 19, 2023

I figured the simplest thing was to use what's there.

Yes, that makes sense. More realistic to how the real data is, too.

@@ -6,10 +6,10 @@ Datasets

Here we cover the realistic simulated datasets, which are analogous to "real world" administrative records such as tax documents
and routinely generated files of social security numbers, that users can generate using Pseudopeople for developing and testing Entity
Copy link
Collaborator

@pletale pletale Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and routinely generated files of social security numbers, that users can generate using Pseudopeople for developing and testing Entity
and routinely generated files of social security numbers, that users can generate using pseudopeople for developing and testing Entity

I just realized the capitalization for 'pseudopeople' on this page is inconsistent with the rest of documentation - it should all be lower-case right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks, pseudopeople should be lower case. I think there are several minor edits like this that we should do to improve the documentation, but I haven't been prioritizing them since we're still working on getting the main content in line with what the engineers are doing. For the sake of limiting the scope of each PR, I think I'm not going to make this change in this PR, but I'll create a JIRA ticket to take a pass at editing the Datasets page. Note that there are a few more edits to make on this page, e.g., at least one more instance of capitalized "Pseudopeople", and "Entity Resolution" should also be lower case.

@NathanielBlairStahn NathanielBlairStahn added the enhancement New feature or request label Jul 21, 2023
albrja added a commit to ihmeuw/vivarium_census_prl_synth_pop that referenced this pull request Aug 23, 2023
Mic 4255/relationship category name updates

Updates categories of relationship to reference person among household members.
- *Category*: Data
- *JIRA issue*: [MIC-4255](https://jira.ihme.washington.edu/browse/MIC-4255)
- *Research reference*: ihmeuw/pseudopeople#229

Changes and notes
-updates relationshiop to reference person category names. Makes it so abbreviated names are no long abbreviated.

Verification and Testing
Successfully rebuilt data key in artifact. Ran simulation and make_results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants