-
Notifications
You must be signed in to change notification settings - Fork 2
ICEES
Description: ICEES Knowledge Graph (KG) is an open service that exposes clinical data (i.e., electronic health records, clinical study data) that have been integrated at the patient level with public exposures data (e.g., airborne pollutants, major roadways/highways, concentrated animal feeding operations, landfills), with pairwise positive and negative correlations between feature variables reported on edges.
Example edge (interpretation): ICEES KG shows that cystic fibrosis is correlated with average daily exposure to PM2.5, with a Chi Square statistic of 19.55 and a P value of 0.0006 (N = 1296) in a cohort of patients from UNC Health in year 2010.
Data source(s): ICEES KG exposes data from varied sources, including electronic health record data, clinical study data, and environmental exposures data.
Key methodologic metrics: ICEES KG provides pairwise correlations between feature variables, with Chi Square statistic, Chi Square P value, odds ratio, log odds ratios, 95% confidence interval for log odds ratio, Fisher’s exact P value for log odds ratio, and sample size.
Regulatory requirement(s) and/or licensing restriction(s): Service is compliant with all federal and institutional regulations.
ICEES provides a regulatory-compliant, open framework and approach for exposing and exploring sensitive patient data (e.g., electronic health records, clinical research data, survey data) that have been integrated with a variety of public environmental exposures data (e.g., airborne pollutants, major roadways/highways, landfills, concentrated animal farming operations, socio-environmental indicators). The design of ICEES is use-case driven, which means that different ICEES endpoints provision data on different cohorts and data elements, albeit with overlap in certain cases.
ICEES is accessible through two general services: (1) ICEES+ is fully featured and supports functionalities such as dynamic cohort creation and exploratory bivariate and multivariate analysis; and (2) ICEES KG is static and supports knowledge graph queries over a pre-computed correlational matrix that provides pairwise comparisons between feature variables. Both ICEES+ and ICEES KG support multiple use cases. Users of both services are provided with a total sample size and statistical metrics on precomputed correlations: Chi Square statistic, degrees of freedom, and P value; Fisher's Exact odds ratio, Fisher's Exact P value, log odds ratio, and 95% confidence interval for the log odds ratio. Note that ICEES KG cohorts are described in edge attributes by file name. For example, cohort "PCD_UNC_patient_2014_v6_binned_deidentified|pcd|v6|2023_02_06_16_21_25" describes a cohort focused on primary ciliary dyskinesia from patients at UNC Health in year 2014; the remainder of the file name indicates that the data are binned and deidentified, indicating full regulatory and institutional compliance, and derived from a v6 dataset generated in 2023.
Example Observations The ICEES KG shows that asthma is positively correlated with fexofenadine, with a log odds ratio of 2.15 (95% confidence interval: [1.99, 2.32]; N=157,412) in a cohort of patients from UNC Health in year 2014. ICEES KG shows that cystic fibrosis is correlated with average daily exposure to PM2.5, with a Chi Square statistic of 19.55 and a P value of 0.0006 (N = 1296) in a cohort of patients from UNC Health in year 2010. ICEES KG shows that cystic fibrosis is positively correlated with cetirizine, with a log odds ratio of 1.96 (95% confidence interval: [1.54, 2.37]; N = 5688) in a cohort of patients from UNC Health in year 2016.
Fecho K, Pfaff E, Xu H, Champion J, Cox S, Stillwell L, Bizon C, Peden D, Krishnamurthy A, Tropsha A, Ahalt SC. A novel approach for exposing and sharing clinical data: the Translator Integrated Clinical and Environmental Exposures Service. J Am Med Inform Assoc 2019;26(10):1064–1073. doi: 10.1093/jamia/ocz042.
Fecho K,* Haaland P, Krishnamurthy A, Lan B, Ramsey S, Schmitt PL, Sharma P, Sinha M, Xu H. An approach for open multivariate analysis of integrated clinical and environmental exposures data. Inform Med Unlocked 2021;26:100733. doi.org/10.1016/j.imu.2021.100733. *Apart from first/lead author, all other authors are listed in alphabetical order.
Lan B,* Haaland P, Krishnamurthy A, Peden DB, Schmitt PL, Sharma P, Sinha M, Xu H, Fecho K. Open application of statistical and machine learning models to explore the impact of environmental exposures on health and disease: an asthma use case. Int J Environ Res Public Health 2021;18(21):11398 [published as part of a special issue titled “Application of Biostatistical Modelling in Public Health and Epidemiology”]. doi: 10.3390/ijerph182111398. *Apart from first/lead and last/senior author, all other authors are listed in alphabetical order.
Fecho K,* Ahalt SC, Appold S, Arunachalam S, Pfaff E, Stillwell L, Valencia A, Xu H, Peden D. Development and application of an open tool for sharing and analyzing integrated clinical and environmental exposures data: asthma use case. JMIR Form Res 2022;6(4):e32357. doi: 10.2196/32357. *Apart from first/lead and last/senior author, all other authors are listed in alphabetical order.
Fecho K, Ahalt SC, Knowles M, Krishnamurthy A, Leigh M, Morton K, Pfaff E, Wang M, Yi H. Leveraging open electronic health record data and environmental exposures data to derive insights into rare pulmonary disease. Front Artif Intell 2022; 5:918888 (special issue on Biomedical Informatics Applications in Rare Diseases). doi: 10.3389/frai.2022.918888. *Apart from the first author, all authors are listed in alphabetical order.
Sharma P, Haaland P, Krishnamurthy A, Lan B, Schmitt PL, Sinha M, Xu H, Fecho K. Evaluating robustness of a generalized linear model when applied to electronic health record data accessed using an openAPI. Health Informatics J 2023;29(2):April-June 2023. *Apart from first/lead and last/senior author, all other authors are listed in alphabetical order. doi: 10.1177/14604582231170892.
Sinha M, Haaland P, Krishnamurthy A, Lan B, Ramsey SA, Schmitt PL, Sharma P, Xu H, Fecho K. Causal analysis for multivariate integrated clinical and environmental exposures data. BMC Medical Informatics and Decision Making, under review. medRxiv preprint is available here: https://www.medrxiv.org/content/10.1101/2022.12.20.22283734v1.
*The link above takes users to a web page that includes a user manual; however, please note that the page is hosted on a site that may contain outdated information on other pages.
- Asthma and related common respiratory disorders
- Primary ciliary dyskinesia and related rare respiratory disorders
- Drug-induced liver injury
- Coronavirus infection
- UNC Health Carolina Data Warehouse for Health (CDWH) data - The CDWH is UNC Health's research copy of their EPIC electronic health record system. Select patient datasets from the CDWH support ICEES.
- [NIEHS Personalized Environment and Genes Study (PEGS) participant data (formerly known as Environmental Polymorphisms Registry] (https://www.niehs.nih.gov/research/atniehs/labs/crb/studies/pegs/index.cfm) - PEGS is a survey-based collection of studies on roughly 20,000 participants. Select PEGS datasets support ICEES.
- DILI Network participant data - The DILI Network maintains longitudinal datasets on participants with confirmed diagnoses of drug-induced liver injury (DILI). Select DILI Network datasets support ICEES.
- US Environmental Protection Agency airborne pollutant exposures data - The US EPA maintains a collection of model-derived estimates of airborne pollutant exposures at varying spatial and temporal resolution. ICEES exposes data on a subset of available airborne pollutants such as PM2.5 and ozone.
- US Department of Transportation (DOT), Federal Highway Administration (FHA), Highway Performance Monitoring System (HPMS) major roadway/highway exposures data - ICEES exposes US DOT data on major roadway/highway exposures, including point estimates of residential distance from a major roadway or highway.
- US Census Bureau TIGER/line roadway data - ICEES also exposes US Census Bureau data on major roadway/highway exposures. These data are used to supplement the US DOT data.
- US Census Bureau American Community Survey (ACS) socio-economic exposures data - The US Census Bureau's ACS is a 5-year survey sample from the decennial nationwide US census. ICEES exposes data on a subset of the available ACS estimates, including survey estimates on residential density, household median income, household access to health insurance, etc.
- NC Department of Environmental Quality (DEQ) concentrated animal feeding operations (CAFO) exposures data - North Carolina's DEQ maintains data on the location of all registered CAFOs across the state. ICEES exposes point estimates of residential distance from the nearest CAFO.
- NC Department of Environmental Quality (DEQ) landfill exposures data - North Carolina's DEQ maintains data on the location of all registered active and inactive landfills across the state. ICEES exposes point estimates of residential distance from the nearest landfill.
- National Center for Education Statistics public school exposures data [not yet integrated] - The National Center for Education Statistics maintains nationwide data on public schools, including data on age of building, water supply, history of lead and asbestos, etc. The Exposures Provider team plans to incorporate these data into ICEES, thereby providing public school exposure estimates for minor children and supporting differential analysis of health outcomes related to school exposures versus primary residence exposures.
- ICEES+ and ICEES KG terms and conditions of use can be accessed here.
- ICEES+ API Code GitHub repository
- ICEES+ API Configuration Files GitHub repository
- ICEES+ API deployment information
- ICEES+ QC tool deployment information also see information posted below
- ICEES+ example queries
- ICEES KG GitHub repository
- ICEES KG Jupyter notebook with example asthma KG query and output
- ICEES KG Jupyter notebook with example DILI KG query and output
- CAMP FHIR
- FHIR-PIT
- Secure Multiparty Computation
- TranQL: Web UI, Example TranQL queries
TRAPI endpoint:
ICEES+ endpoints:
- ICEES+ Asthma API - development (PermaLink)
- ICEES+ Asthma API - production (PermaLink)
- ICEES+ DILI API - development (PermaLink)
- ICEES+ DILI API - production (PermaLink)
- ICEES+ PCD API - development (PermaLink)
- ICEES+ PCD API - production (PermaLink)
- ICEES+ COVID API - development (PermaLink)
- ICEES+ COVID API - production (PermaLink)
ICEES+ SMC OpenAPI endpoints:
- ICEES+ UNC SMC API dummy data for developing SMC algorithm (port not active)
- ICEES+ NIEHS SMC API dummy data for developing SMC algorithm (port not active)
- ICEES+ Duke SMC API dummy data for developing SMC algorithm (port not active)
Environmental Exposures OpenAPIs
- Socio-economic Exposures Service; GitHub repo
- Airborne Pollutant Exposures Service; GitHub repo
- Roadway Exposures Service; GitHub repo
Issues should be posted in the ICEES+ GitHub repository or the ICEES KG GitHub repository.
Exposures Provider also supports the CAM (Causal Activity Model)/AOP (Adverse Outcome Pathway) Knowledge Provider (KP), described at CAM Provider KG.