Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding data quality indicators #72

Open
Makosak opened this issue Nov 25, 2024 · 1 comment
Open

Adding data quality indicators #72

Makosak opened this issue Nov 25, 2024 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@Makosak
Copy link
Collaborator

Makosak commented Nov 25, 2024

Missing / Uncertain Measures

Some variables have a high degree of missingness/uncertainty. For example, prison incarceration rates will be missing in several areas, and as we add crime vars, same issue.

Let's add a red or colorful circle with an exclamation point in the "documentation" view, so folks can be alerted to data not being complete (or other challenges).

Quality Indicator

In a similar vein, would be helpful if we start to collate and add metrics to indicate high/mod data quality in the future, marked as visual indicators but also with details documented. For example, data that have sufficient variability, completeness, documentation, FAIR standards, and have been used extensively in research/robust evidence base, could be highlighted.

May want to connect with NYU team (LOUD study) and UChicago team (MAARC) to brainstorm and refine this approach. (For Loud team, a working "data requirements" doc has been started -- may want to use Projects option in this Github repo as alternative collab doc?)

@Makosak Makosak added the enhancement New feature or request label Nov 25, 2024
@mradamcox
Copy link
Contributor

This is a good idea, and we should be able to key it off of properties attached to each variable. A full variable definition now looks like this:

"RxCntDr": {
    "title": "Count of Pharmacies (30-min drive)",
    "name": "RxCntDr",
    "src_name": "RxCntDr",
    "type": "integer",
    "example": "58",
    "description": "Count of pharmacies within a 30-minute driving threshold",
    "constraints": "*Euclidean distance or straight-line distance is a simple approximation of distance or travel time from an origin centroid to the nearest health center. It is not a precise calculation of real travel times or distances.",
    "construct": "Access to Pharmacies",
    "source": "InfoGroup, 2019",
    "source_long": "InfoGroup, 2019",
    "oeps_v1_table": null,
    "comments": "This dataset includes all US states, Washington D.C., and Puerto Rico. It does not include the territories Guam, Northern Mariana Islands, American Samoa, Palau. Zip code and tract centroids are not population-weighted.",
    "metadata_doc_url": "https://github.com/GeoDaCenter/opioid-policy-scan/blob/main/data_final/metadata/Access_Pharmacies_MinDistance.md",
    "longitudinal": false,
    "analysis": false,
    "table_sources": [
      "t-latest",
      "z-latest"
    ]
  },

I'm thinking it would be best if we could utilize the constraints property; if there is an entry present for it on a variable, then we would add the visual indicator and maybe a popup populated by the content of that field.

One thing to note though is that I'm not sure this would work on the data docs page, because we don't list individual variables there, they are aggregated up to the "construct" level.

@mradamcox mradamcox added this to the v2.1 Release milestone Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants