-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
372 additions
and
151 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
|
||
### Checklist | ||
- [ ] Consider if documentation (like in `docs/`) needs to be updated | ||
- [ ] Consider if tests should be added |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,165 +1,39 @@ | ||
# chart-review | ||
Measure agreement between two "_reviewers_" from the "_confusion matrix_" | ||
# Chart Review | ||
|
||
**Measure agreement between chart annotations.** | ||
|
||
Whether your chart annotations come from humans, machine-learning, or coded data like ICD-10, | ||
`chart-review` can compare them to reveal interesting statistics like: | ||
|
||
**Accuracy** | ||
* F1-score ([agreement](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090460/)) | ||
* [Sensitivity and Specificity](https://en.wikipedia.org/wiki/Sensitivity_and_specificity) | ||
* [Positive (PPV) or Negative Predictive Value (NPV)](https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values#Relationship)) | ||
* False Negative Rate (FNR) | ||
* [Positive (PPV) or Negative Predictive Value (NPV)](https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values#Relationship) | ||
* False Negative Rate (FNR) | ||
|
||
**Confusion Matrix** | ||
**Confusion Matrix** | ||
* TP = True Positive (type I error) | ||
* TN = True Negative (type II error) | ||
* FP = False Positive | ||
* FN = False Negative | ||
|
||
**Power Calculations** for sample size estimation | ||
* Power = 1 - FNR | ||
* FNR = FN / (FN + TP) | ||
* FP = False Positive | ||
* FN = False Negative | ||
|
||
## Example | ||
|
||
--- | ||
**CHART-REVIEW** here is defined as "reading" and "annotating" (highlighting) medical notes to measure accuracy of a measurement. | ||
Measurements can establish the reliability of ICD10, or the reliable utility of NLP to automate labor intensive process. | ||
|
||
Agreement among 2+ human subject matter expert reviewers is considered the defacto gold-standard for ground-truth labeling, but cannot be done manually at scale. | ||
|
||
The most common chart-review measures agreement of the _**class_label**_ from a careful list of notes | ||
* 1 human reviewer _vs_ ICD10 codes | ||
* 1 human reviewer _vs_ NLP results | ||
* 2 human reviewers _vs_ each other | ||
```shell | ||
$ ls | ||
config.yaml labelstudio-export.json | ||
|
||
$ chart-review accuracy jane john | ||
accuracy-jane-john: | ||
F1 Sens Spec PPV NPV TP FN TN FP Label | ||
0.889 0.8 1.0 1.0 0.5 4 1 1 0 * | ||
1.0 1.0 1.0 1.0 1.0 1 0 1 0 Cough | ||
0 0 0 0 0 2 0 0 0 Fatigue | ||
0 0 0 0 0 1 1 0 0 Headache | ||
``` | ||
|
||
--- | ||
### How to Install | ||
## Install | ||
1. Clone this repo. | ||
2. Install it locally like so: `pipx install .` | ||
|
||
`chart-review` is not yet released on PyPI. | ||
|
||
--- | ||
### How to Run | ||
|
||
#### Set Up Project Folder | ||
|
||
Chart Review operates on a project folder that holds your config & data. | ||
1. Make a new folder. | ||
2. Export your Label Studio annotations and put that in the folder as `labelstudio-export.json`. | ||
3. Add a `config.yaml` file (or `config.json`) that looks something like this (read more on this format below): | ||
|
||
```yaml | ||
labels: | ||
- cough | ||
- fever | ||
|
||
annotators: | ||
jane: 2 | ||
john: 6 | ||
jack: 8 | ||
|
||
ranges: | ||
jane: 242-250 # inclusive | ||
john: [260-271, 277] | ||
jack: [jane, john] | ||
``` | ||
#### Run | ||
Call `chart-review` with the sub-command you want and its arguments: | ||
|
||
For Jane as truth for Jack's annotations: | ||
```shell | ||
chart-review accuracy jane jack | ||
``` | ||
|
||
For Jack as truth for John's annotations: | ||
```shell | ||
chart-review accuracy jack john | ||
``` | ||
|
||
Pass `--help` to see more options. | ||
|
||
--- | ||
### Config File Format | ||
|
||
`config.yaml` defines study specific variables. | ||
|
||
* Class labels: `labels: ['cough', 'fever']` | ||
* Annotators: `annotators: {'jane': 3, 'john': 8}` | ||
* Note ranges: `ranges: {'jane': 40-50, 'john': [2, 3, 4, 5]}` | ||
|
||
`annotators` maps a name to a Label Studio User ID | ||
* human subject matter expert _like_ `jane` | ||
* computer method _like_ `nlp` | ||
* coded data sources _like_ `icd10` | ||
|
||
`ranges` maps a selection of Note IDs from the corpus | ||
* `corpus: start:end` | ||
* `annotator1_vs_2: [list, of, notes]` | ||
* `annotator2_vs_3: corpus` | ||
|
||
#### External Annotations | ||
|
||
You may have annotations from NLP or coded FHIR data that you want to compare against. | ||
Easy! | ||
|
||
Set up your config to point at a CSV file in your project folder that holds two columns: | ||
- DocRef ID (real or anonymous) | ||
- Label | ||
|
||
```yaml | ||
annotators: | ||
human: 1 | ||
external_nlp: | ||
filename: my_nlp.csv | ||
``` | ||
|
||
When `chart-review` runs, it will inject the external annotations and match up the DocRef IDs | ||
to Label Studio notes based on metadata in your Label Studio export. | ||
|
||
--- | ||
**BASE COHORT METHODS** | ||
|
||
`cohort.py` | ||
* from chart_review import _labelstudio_, _mentions_, _agree_ | ||
|
||
class **Cohort** defines the base class to analyze study cohorts. | ||
* init(`config.py`) | ||
|
||
`simplify.py` | ||
* **rollup**(...) : return _LabelStudioExport_ with 1 "rollup" annotation replacing individual mentions | ||
|
||
`term_freq.py` (methods are rarely used currently) | ||
* overlaps(...) : test if two mentions overlap (True/False) | ||
* calc_term_freq(...) : term frequency of highlighted mention text | ||
* calc_term_label_confusion : report of exact mentions with 2+ class_labels | ||
|
||
`agree.py` get confusion matrix comparing annotators {truth, annotator} | ||
* **confusion_matrix** (truth, annotator, ...) returns List[TruePos, TrueNeg, FalsePos, FalseNeg] | ||
* **score_matrix** (matrix) returns dict with keys {F1, Sens, Spec, PPV, NPV, TP,FP,TN,FN} | ||
|
||
`labelstudio.py` handles LabelStudio JSON | ||
|
||
Class **LabelStudioExport** | ||
* init(`labelstudio-export.json`) | ||
|
||
Class **LabelStudioNote** | ||
* init(...) | ||
|
||
`publish.py` tables and figures for PubMed manuscripts | ||
* table_csv(...) | ||
* table_json(...) | ||
|
||
--- | ||
**NICE TO HAVES LATER** | ||
|
||
* **_confusion matrix_** type support using Pandas | ||
* **score_matrix** would be nicer to use a Pandas strongly typed class | ||
|
||
--- | ||
### Set up your dev environment | ||
|
||
To use the same dev environment as us, you'll want to run these commands: | ||
```sh | ||
pip install .[dev] | ||
pre-commit install | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Chart Review Documentation | ||
|
||
These documents are meant to be built as one part of the larger body of | ||
[Cumulus documentation](https://docs.smarthealthit.org/cumulus). | ||
|
||
To test changes here locally, read more at the [Cumulus docs repo](https://github.com/smart-on-fhir/cumulus). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
--- | ||
title: Accuracy Command | ||
parent: Chart Review | ||
nav_order: 5 | ||
# audience: lightly technical folks | ||
# type: how-to | ||
--- | ||
|
||
# The Accuracy Command | ||
|
||
The `accuracy` command will print agreement statistics like F1 scores and confusion matrices | ||
for every label in your project, between two annotators. | ||
|
||
Provide two annotator names (the first name will be considered the ground truth) and | ||
your accuracy scores will be printed to the console. | ||
|
||
## Example | ||
|
||
```shell | ||
$ chart-review accuracy jane john | ||
accuracy-jane-john: | ||
F1 Sens Spec PPV NPV TP FN TN FP Label | ||
0.929 0.958 0.908 0.901 0.961 91 4 99 10 * | ||
0.895 0.895 0.938 0.895 0.938 17 2 30 2 cough | ||
0.815 0.917 0.897 0.733 0.972 11 1 35 4 fever | ||
0.959 1.0 0.812 0.921 1.0 35 0 13 3 headache | ||
0.966 0.966 0.955 0.966 0.955 28 1 21 1 stuffy-nose | ||
``` | ||
|
||
## Options | ||
|
||
### `--config=PATH` | ||
|
||
Use this to point to a secondary (non-default) config file. | ||
Useful if you have multiple label setups (e.g. one grouped into a binary label and one not). | ||
|
||
### `--project-dir=DIR` | ||
|
||
Use this to run `chart-review` outside of your project dir. | ||
Config files, external annotations, etc will be looked for in that directory. | ||
|
||
### `--save` | ||
|
||
Use this to write a JSON and CSV file to the project directory, | ||
rather than printing to the console. | ||
Useful for passing results around in a machine-parsable format. |
Oops, something went wrong.