Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement support for repeats #113

Open
reece opened this issue Jan 22, 2014 · 4 comments
Open

implement support for repeats #113

reece opened this issue Jan 22, 2014 · 4 comments
Labels
enhancement New feature or request keep alive exempt issue from staleness checks

Comments

@reece
Copy link
Member

reece commented Jan 22, 2014

Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)


See comments in hgvs.pymeta.

http://www.hgvs.org/mutnomen/recs-DNA.html#var

Lots of gotchas:

  • reported as start position w/seq, or interval without seq. e.g., g.123TG[4], but not g.123_124TG[4]
  • interleaved repeats: g.456TG[4]TA[9]TG[3] or g.456_465[4]466_489[9]490_499[3]
  • () around repeat count -> uncertainty
  • implicit data for hets: g.1209_4523[14];[23] (same as g.[1209_4523[14];1209_4523[23]]
  • repeats counts may be uncertain. Although not in the mutnomen doc, I think these are legit: ((6)_22), (?_22).

This definitely merits a feature branch.

Links

  • imported from: CORE-113 (Invitae access required)

@larrybabb
Copy link

@reece would it be reasonable to break this issue up into a separate ticket for each of the above "gotchas" bullet points and tackle them separately or some logical order that would allow us to get the first bullet resolved and then move on to a refactoring to handle the subsequent concerns?

@larrybabb
Copy link

@andreasprlic This is the ticket you and I discussed this morning. I would love to do an MVP of repeat syntax support in the hgvs package if that is possible. @reece rightfully points out the nuances and gotchas involved with this syntax. I'm looking for the basics like supporting only the first bullet he points out.

Here's the way I think we could/should deliver this feature in this module.

First, the basics

  • reported as start position w/seq, or interval without seq. e.g., g.123TG[4], but not g.123_124TG[4]

Second, ranges

  • repeats counts may be uncertain. Although not in the mutnomen doc, I think these are legit: ((6)_22), (?_22).

Again, while the parens () convey uncertainty there are many examples in the wild of not using the parenthesis on ranges. I'd like to make sure we can support these non-compliant representations since they are fairly prevalent.

Here's a few examples from clinvar....

Third, complex repeats

  • interleaved repeats: g.456TG[4]TA[9]TG[3] or g.456_465[4]466_489[9]490_499[3]

Fourth, genotypes / hets

  • implicit data for hets: g.1209_4523[14];[23] (same as g.[1209_4523[14];1209_4523[23]]

@andreasprlic Any assistance or management you can assist to break this up so we can start delivering on it would be wonderful.

Copy link

github-actions bot commented Jun 3, 2024

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Jun 3, 2024
@andreasprlic andreasprlic removed the stale Issue is stale and subject to automatic closing label Jun 3, 2024
@andreasprlic
Copy link
Member

Did a small demo in this branch for how to use a fully justified representation of a variant to easily identify basic repeats:
https://github.com/andreasprlic/hgvs/blob/pretty_print/src/hgvs/repeats.py
https://github.com/andreasprlic/hgvs/blob/pretty_print/tests/test_repeats.py

@jsstevenson jsstevenson added the keep alive exempt issue from staleness checks label Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request keep alive exempt issue from staleness checks
Projects
None yet
Development

No branches or pull requests

4 participants