-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coordinate is out of bounds error when roundtripping GRCh38 variants via g_to_c
+ c_to_g
and relevant_transcripts
#717
Comments
Looking into this, it turns out that the hgvs_c string here is Somehow in assembly 38 this position seems to be outside of transcript coding space, therefore there is a Looking into the underlying alignment in UTA, The last exon (that is fully coding for the UTR) has alignment issues in both assemblies. in assembly 37 this transcript has one exon more but the alignment is I would conclude from that that neither assembly is a good match for the terminal UTR region here, and we should prob not assume that we can map variants in that region across assemblies. I don't think there's a hgvs bug here, this is just the nature of the reference assemblies... |
This is a workaround for some liftover failures: biocommons/hgvs#717
Thank you for looking into this @andreasprlic and for providing such a detailed explanation! Given this issue, do you think it would make sense to bubble up |
This issue was auto-closed due to inactivity, but I think it's related |
@mihaitodor if your goal is identifying if the location of a variant in another genome assembly, I am concerned about trying to do that in regions where the assemblies have changed a lot. Just because you can compute some coordinates does not mean these are biologically "the same". I really think in the example you provided above, the lifted over coordinates should not be trusted. As such I would not encourage you to map across assemblies when strict_bounds=False are necessary. I also would recommend to think about a QC approach to make sure that any lifted over variant can be considered to be equivalent in the context of both assemblies, otherwise treat them as distinct variants. |
Thank you @andreasprlic! Indeed, in such cases it's probably better to error out. Initially, we had some code based on pyliftover which I guess would have similar issues? I opened #711 after @reece mentioned that it would be handy to have some direct liftover support in hgvs. The code I'm trying to add some features to is just a reference implementation for now, but indeed, we'll have to be stricter in production-ready implementations. |
This is a workaround for some liftover failures: biocommons/hgvs#717
This is a workaround for some liftover failures: biocommons/hgvs#717
This is a workaround for some liftover failures: biocommons/hgvs#717
This is a workaround for some liftover failures: biocommons/hgvs#717
Describe the bug
I'm parsing a
g.
variant, then I'm fetching some relevant transcripts for it via theGRCh38
AssemblyMapper, then I'm projecting it to any of theNM
relevant transcript and then I'm projecting that back to ag.
variant. Naively, what I think should happen is that it should be able to round-trip and print the input variant, but right now it's just throwing this error. Note that it does work as expected if you use, for examplevariant = 'NC_000011.10:g.8263343T>C'
. Also, it works if I useGRCh37
.To Reproduce
Steps to reproduce the behavior:
Running the above code fails with the following error:
Expected behavior
The code should print
NC_000001.10:g.145592073A>T
at the end.Additional context
@reece mentioned on the
biocommons
Slack that this looks like a bug.The text was updated successfully, but these errors were encountered: