Skip to content
This repository has been archived by the owner on Nov 18, 2023. It is now read-only.

Develop tests based on clinical "HGVS expressions" #16

Open
rrfreimuth opened this issue Aug 28, 2023 · 6 comments
Open

Develop tests based on clinical "HGVS expressions" #16

rrfreimuth opened this issue Aug 28, 2023 · 6 comments
Labels
hgvs Project is for HGVS newcomer Project is good for newcomers

Comments

@rrfreimuth
Copy link

Submitter Name

Bob Freimuth

Submitter Affiliation

Mayo Clinic

Requested By

Mayo Clinic

Additional Submitter Details

No response

Lead(s)

Salem Bajjali and Bob Freimuth

biocommons Repo

hgvs

Project Details

The hgvs tools provide extensive capabilities to parse and assemble HGVS expressions. As a step towards tooling that supports translation between HGVS, VRS, and FHIR, a robust set of test data is needed. While it is possible to extract high volumes of variation data from public databases, we propose to use examples based on clinical test reports, which often use incomplete or inaccurate HGVS expressions. The development of this test data set will provide for the hgvs package a curated set of examples that pass and fail validation in known ways.

Aim 1: Develop the test data set
20 examples of HGVS expressions reported in clinical genetic test reports were extracted. Each example was minimally altered so that it was different from the actual patient result and matched a record from ClinVar, but the original syntax of the expression from the test report was maintained. For each example, 4 additional representations were generated that included and omitted different portions of the expression.
This aim will structure the example data set in file(s) that can be used as part of automated testing for the hgvs tooling. Minimally, each example will be human-curated to indicate expected pass/fail. Future work could add expected exceptions or detailed error messages that indicate why an expression failed validation.

Aim 2: Develop an automated test script
A test script will be written that automatically runs each example from aim 1 through the hgvs parser. Actual results will be compared to the expected results.

Aim 3: Documentation
Write documentation to support the use of the test data set.

Aim 4: Create a PR
The outputs of aims 1-3 will form the core of a PR for the hgvs package in biocommons.

@korikuzma korikuzma added the hgvs Project is for HGVS label Aug 31, 2023
@korikuzma
Copy link
Contributor

@rrfreimuth @SalemBajjali I think developing real-life test datasets is great. What skill level would be good for this project (newcomer, intermediate, advanced)?

@rrfreimuth
Copy link
Author

@korikuzma I'd tag it as newcomer or intermediate, depending on point of view. It requires ability to navigate the hgvs package and potentially extend the exception messages for the class.

@korikuzma
Copy link
Contributor

@rrfreimuth I'll tag as good for newcomers. @SalemBajjali, if you think it should be intermediate I can change!

@korikuzma korikuzma added the newcomer Project is good for newcomers label Aug 31, 2023
@reece
Copy link
Member

reece commented Sep 15, 2023

@rrfreimuth @SalemBajjali : hgvs has a bunch of tests in https://github.com/biocommons/hgvs/tree/main/tests/data . Many of these tests were provided by curation scientists at Invitae.

@rrfreimuth
Copy link
Author

Thanks, @reece ! I looked through a couple of the files and I think we should consider adopting that format so that it integrates naturally into the rest of the test suite. We'll very likely need help with that.

I don't expect we'll find test cases that aren't already covered, but this exercise is less about the technical validation of the tooling (which is already robust) and is more about exploring the behavior of the tooling when invalid input data are used. I am very interested in documenting the error messages for each failing case and thinking about whether supporting info is needed to help a non-technical domain expert interpret the info to fix the input. We could also brainstorm ways to generate suggestions to the user to help lower the effort required to clean up the input data.

@reece
Copy link
Member

reece commented Sep 16, 2023

Terrific. Happy to talk more about it. Also, there are now better ways to structure tests like these (pytest's "test parametrization") and this would be a great excuse to overhaul the current set.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hgvs Project is for HGVS newcomer Project is good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants