Adding data quality indicators #72

Makosak · 2024-11-25T17:08:50Z

Missing / Uncertain Measures

Some variables have a high degree of missingness/uncertainty. For example, prison incarceration rates will be missing in several areas, and as we add crime vars, same issue.

Let's add a red or colorful circle with an exclamation point in the "documentation" view, so folks can be alerted to data not being complete (or other challenges).

Quality Indicator

In a similar vein, would be helpful if we start to collate and add metrics to indicate high/mod data quality in the future, marked as visual indicators but also with details documented. For example, data that have sufficient variability, completeness, documentation, FAIR standards, and have been used extensively in research/robust evidence base, could be highlighted.

May want to connect with NYU team (LOUD study) and UChicago team (MAARC) to brainstorm and refine this approach. (For Loud team, a working "data requirements" doc has been started -- may want to use Projects option in this Github repo as alternative collab doc?)

mradamcox · 2024-12-06T00:58:30Z

This is a good idea, and we should be able to key it off of properties attached to each variable. A full variable definition now looks like this:

"RxCntDr": {
    "title": "Count of Pharmacies (30-min drive)",
    "name": "RxCntDr",
    "src_name": "RxCntDr",
    "type": "integer",
    "example": "58",
    "description": "Count of pharmacies within a 30-minute driving threshold",
    "constraints": "*Euclidean distance or straight-line distance is a simple approximation of distance or travel time from an origin centroid to the nearest health center. It is not a precise calculation of real travel times or distances.",
    "construct": "Access to Pharmacies",
    "source": "InfoGroup, 2019",
    "source_long": "InfoGroup, 2019",
    "oeps_v1_table": null,
    "comments": "This dataset includes all US states, Washington D.C., and Puerto Rico. It does not include the territories Guam, Northern Mariana Islands, American Samoa, Palau. Zip code and tract centroids are not population-weighted.",
    "metadata_doc_url": "https://github.com/GeoDaCenter/opioid-policy-scan/blob/main/data_final/metadata/Access_Pharmacies_MinDistance.md",
    "longitudinal": false,
    "analysis": false,
    "table_sources": [
      "t-latest",
      "z-latest"
    ]
  },

I'm thinking it would be best if we could utilize the constraints property; if there is an entry present for it on a variable, then we would add the visual indicator and maybe a popup populated by the content of that field.

One thing to note though is that I'm not sure this would work on the data docs page, because we don't list individual variables there, they are aggregated up to the "construct" level.

Makosak added the enhancement New feature or request label Nov 25, 2024

mradamcox added this to the v2.1 Release milestone Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding data quality indicators #72

Adding data quality indicators #72

Makosak commented Nov 25, 2024

mradamcox commented Dec 6, 2024

Adding data quality indicators #72

Adding data quality indicators #72

Comments

Makosak commented Nov 25, 2024

Missing / Uncertain Measures

Quality Indicator

mradamcox commented Dec 6, 2024