-
Notifications
You must be signed in to change notification settings - Fork 0
Develop tests based on clinical "HGVS expressions" #16
Comments
@rrfreimuth @SalemBajjali I think developing real-life test datasets is great. What skill level would be good for this project (newcomer, intermediate, advanced)? |
@korikuzma I'd tag it as newcomer or intermediate, depending on point of view. It requires ability to navigate the hgvs package and potentially extend the exception messages for the class. |
@rrfreimuth I'll tag as good for newcomers. @SalemBajjali, if you think it should be intermediate I can change! |
@rrfreimuth @SalemBajjali : hgvs has a bunch of tests in https://github.com/biocommons/hgvs/tree/main/tests/data . Many of these tests were provided by curation scientists at Invitae. |
Thanks, @reece ! I looked through a couple of the files and I think we should consider adopting that format so that it integrates naturally into the rest of the test suite. We'll very likely need help with that. I don't expect we'll find test cases that aren't already covered, but this exercise is less about the technical validation of the tooling (which is already robust) and is more about exploring the behavior of the tooling when invalid input data are used. I am very interested in documenting the error messages for each failing case and thinking about whether supporting info is needed to help a non-technical domain expert interpret the info to fix the input. We could also brainstorm ways to generate suggestions to the user to help lower the effort required to clean up the input data. |
Terrific. Happy to talk more about it. Also, there are now better ways to structure tests like these (pytest's "test parametrization") and this would be a great excuse to overhaul the current set. |
Submitter Name
Bob Freimuth
Submitter Affiliation
Mayo Clinic
Requested By
Mayo Clinic
Additional Submitter Details
No response
Lead(s)
Salem Bajjali and Bob Freimuth
biocommons Repo
hgvs
Project Details
The hgvs tools provide extensive capabilities to parse and assemble HGVS expressions. As a step towards tooling that supports translation between HGVS, VRS, and FHIR, a robust set of test data is needed. While it is possible to extract high volumes of variation data from public databases, we propose to use examples based on clinical test reports, which often use incomplete or inaccurate HGVS expressions. The development of this test data set will provide for the hgvs package a curated set of examples that pass and fail validation in known ways.
Aim 1: Develop the test data set
20 examples of HGVS expressions reported in clinical genetic test reports were extracted. Each example was minimally altered so that it was different from the actual patient result and matched a record from ClinVar, but the original syntax of the expression from the test report was maintained. For each example, 4 additional representations were generated that included and omitted different portions of the expression.
This aim will structure the example data set in file(s) that can be used as part of automated testing for the hgvs tooling. Minimally, each example will be human-curated to indicate expected pass/fail. Future work could add expected exceptions or detailed error messages that indicate why an expression failed validation.
Aim 2: Develop an automated test script
A test script will be written that automatically runs each example from aim 1 through the hgvs parser. Actual results will be compared to the expected results.
Aim 3: Documentation
Write documentation to support the use of the test data set.
Aim 4: Create a PR
The outputs of aims 1-3 will form the core of a PR for the hgvs package in biocommons.
The text was updated successfully, but these errors were encountered: