Skip to content

Preparing clinical texts for N3C Enclave

Sijia Liu edited this page Dec 15, 2021 · 1 revision

This page documents the technical detail of contributing extracted NLP concepts to the N3C Enclave. For the standard operation procedure (SOP) as an N3C data contribution site, please refer to the guide from the N3C Phenotype & Data Acquisition Workstream and the N3C Data Ingestion and Harmonization.

Preparing clinical texts

Please follow the documentation of OHNLP Backbone to prepare the clinical texts for COVID-19 related concept extraction.

Post-processing for OMOP CDM

After the clinical concepts are extracted and stored in the OMOP CDM NOTE_NLP table via OHNLP Backbone, the OHDSI CDM NOTE, NOTE_NLP table shared as CSV files. Note that person_id, visit_occurrence_id in the CDM NOTE table should match those used in the other data files in your submission. Per N3C Requests, visit_occurrence_id cannot have a non-matching key.

Data Schema

Data Schema is as follows - note the below for what columns should be truncated for PHI:

NOTE table

  • note_id: string
  • person_id: bigint - corresponding to the de-identified person_id in the rest of your submission
  • note_date: timestamp (ISO8601 Compliant yyyy-MM-dd'T'HH:mm:ssZ)
  • note_datetime: timestamp (ISO8601 Compliant)
  • note_type_concept_id: int
  • note_class_concept_id: int
  • note_title: string (truncated due to PHI concerns)
  • note_text: string (truncated due to PHI concerns)
  • encoding_concept_id: int
  • language_concept_id: int
  • provider_id: bigint (truncated due to PHI)
  • visit_occurrence_id: bigint - must match the rest of your submission
  • visit_detail_id: null
  • note_source_value: string (truncated due to PHI)

NOTE_NLP table

  • note_nlp_id: int
  • note_id: string
  • section_concept_id: int
  • snippet: string (truncated due to PHI)
  • offset: int
  • lexical_Variant: (truncated due to PHI)
  • note_nlp_concept_id: int
  • note_nlp_source_concept_id: int (blank)
  • nlp_system: string
  • nlp_date/nlp_datetime: timestamp (ISO8601 Compliant)
  • term_exists: varchar(1) – Y/N
  • term_temporal: string
  • term_modifiers: string

For the definition of these columns, please refer to the OHDSI CDM documentation of the NOTE and NOTE_NLP tables.