Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize shapefiles #68

Closed
mradamcox opened this issue Aug 23, 2023 · 2 comments · Fixed by #71
Closed

Standardize shapefiles #68

mradamcox opened this issue Aug 23, 2023 · 2 comments · Fixed by #71
Assignees

Comments

@mradamcox
Copy link
Collaborator

A little overhaul of the spatial data here would be good. Some tasks include:

  • Inspect non-join columns, like state name or county name, and perhaps add more if needed. These will be columns that can be added to joined and exported data, so formatting and making them look good is important.
  • Standardize GEOIDs across spatial resolutions and time (e.g. the join field for ZTCAs should be the same name in 2010 shp as 2018).
  • Update columns in all CSVs as needed to streamline the joins.
@mradamcox
Copy link
Collaborator Author

After a good bit of research on different ids via the Census Bureau website (and help from folks on their Slack workspace) I've decided to create a new hybrid identifier for our purposes here, HEROP_ID. This will allow us to tack a new field to the CSVs and Shapefiles, without changing the meaning of any existing columns, like GEOID, and we can retain all existing columns as well for backward compatibility. This field will streamline the join process, and provide a single structure for all geographic levels.

Here's an example for each level (modified table excerpt from Understanding Geographic Identifiers (GEOIDs)):

Area Type GEOID Structure Number of Digits Example Geographic Area Example HEROP_ID
State STATE 2 Texas 040US48
County STATE+COUNTY 2+3=5 Harris County, TX 050US48201
Census Tract STATE+COUNTY+TRACT 2+3+6=11 Census Tract 2231 in Harris County, TX 140US48201223100
ZCTA ZCTA 5 Suitland, MD ZCTA 860US20746

The HEROP_ID format is similar to, but simpler than, the GEOID format from data.census.gov that is described at the bottom of Understanding Geographic Identifiers (GEOIDs). While the latter has four internal digits: 2 for Geographic Variant and 2 for Geographic Component, we don't record that information in our geometries now (I'm not even sure they are relevant to the geographic areas we are working with anyway), so these four digits are eliminated. That leaves us with the following format:

Summary Level Code + "US" + GEOID

Where Summary Level Code is a 3-digit number, and the "US" in the middle will force the value to text in any spreadsheet software, as the "G" prefix has done in the past.

@mradamcox
Copy link
Collaborator Author

One update to the HEROP_ID that will be made is the addition of a suffix for the year, which is necessary now that we have multiple years of geographies. If a row is meant to join to a 2018 county geography, for example, its id will now look be composed as Summary Level Code + "US" + State FP + County FP + "-" + Geography Year: 050US01001-2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant