Develop tests based on clinical "HGVS expressions" #16

rrfreimuth · 2023-08-28T15:44:52Z

Submitter Name

Bob Freimuth

Submitter Affiliation

Mayo Clinic

Requested By

Mayo Clinic

Additional Submitter Details

No response

Lead(s)

Salem Bajjali and Bob Freimuth

biocommons Repo

hgvs

Project Details

The hgvs tools provide extensive capabilities to parse and assemble HGVS expressions. As a step towards tooling that supports translation between HGVS, VRS, and FHIR, a robust set of test data is needed. While it is possible to extract high volumes of variation data from public databases, we propose to use examples based on clinical test reports, which often use incomplete or inaccurate HGVS expressions. The development of this test data set will provide for the hgvs package a curated set of examples that pass and fail validation in known ways.

Aim 1: Develop the test data set
20 examples of HGVS expressions reported in clinical genetic test reports were extracted. Each example was minimally altered so that it was different from the actual patient result and matched a record from ClinVar, but the original syntax of the expression from the test report was maintained. For each example, 4 additional representations were generated that included and omitted different portions of the expression.
This aim will structure the example data set in file(s) that can be used as part of automated testing for the hgvs tooling. Minimally, each example will be human-curated to indicate expected pass/fail. Future work could add expected exceptions or detailed error messages that indicate why an expression failed validation.

Aim 2: Develop an automated test script
A test script will be written that automatically runs each example from aim 1 through the hgvs parser. Actual results will be compared to the expected results.

Aim 3: Documentation
Write documentation to support the use of the test data set.

Aim 4: Create a PR
The outputs of aims 1-3 will form the core of a PR for the hgvs package in biocommons.

korikuzma · 2023-08-31T11:36:24Z

@rrfreimuth @SalemBajjali I think developing real-life test datasets is great. What skill level would be good for this project (newcomer, intermediate, advanced)?

rrfreimuth · 2023-08-31T12:32:11Z

@korikuzma I'd tag it as newcomer or intermediate, depending on point of view. It requires ability to navigate the hgvs package and potentially extend the exception messages for the class.

korikuzma · 2023-08-31T13:11:15Z

@rrfreimuth I'll tag as good for newcomers. @SalemBajjali, if you think it should be intermediate I can change!

reece · 2023-09-15T19:45:11Z

@rrfreimuth @SalemBajjali : hgvs has a bunch of tests in https://github.com/biocommons/hgvs/tree/main/tests/data . Many of these tests were provided by curation scientists at Invitae.

rrfreimuth · 2023-09-15T20:10:08Z

Thanks, @reece ! I looked through a couple of the files and I think we should consider adopting that format so that it integrates naturally into the rest of the test suite. We'll very likely need help with that.

I don't expect we'll find test cases that aren't already covered, but this exercise is less about the technical validation of the tooling (which is already robust) and is more about exploring the behavior of the tooling when invalid input data are used. I am very interested in documenting the error messages for each failing case and thinking about whether supporting info is needed to help a non-technical domain expert interpret the info to fix the input. We could also brainstorm ways to generate suggestions to the user to help lower the effort required to clean up the input data.

reece · 2023-09-16T00:48:09Z

Terrific. Happy to talk more about it. Also, there are now better ways to structure tests like these (pytest's "test parametrization") and this would be a great excuse to overhaul the current set.

korikuzma added the hgvs Project is for HGVS label Aug 31, 2023

korikuzma added this to biocommons near-term roadmap Aug 31, 2023

korikuzma added the newcomer Project is good for newcomers label Aug 31, 2023

reece removed this from biocommons near-term roadmap Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop tests based on clinical "HGVS expressions" #16

Develop tests based on clinical "HGVS expressions" #16

rrfreimuth commented Aug 28, 2023

korikuzma commented Aug 31, 2023

rrfreimuth commented Aug 31, 2023

korikuzma commented Aug 31, 2023

reece commented Sep 15, 2023 •

edited

Loading

rrfreimuth commented Sep 15, 2023

reece commented Sep 16, 2023

Develop tests based on clinical "HGVS expressions" #16

Develop tests based on clinical "HGVS expressions" #16

Comments

rrfreimuth commented Aug 28, 2023

Submitter Name

Submitter Affiliation

Requested By

Additional Submitter Details

Lead(s)

biocommons Repo

Project Details

korikuzma commented Aug 31, 2023

rrfreimuth commented Aug 31, 2023

korikuzma commented Aug 31, 2023

reece commented Sep 15, 2023 • edited Loading

rrfreimuth commented Sep 15, 2023

reece commented Sep 16, 2023

reece commented Sep 15, 2023 •

edited

Loading