-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address RefSeq transcript misalignments #447
Comments
This approach for the resolution will work for now. A proper resolution will be done in #450. @pnrobinson @julesjacobsen as ENSEMBL is not affected this will probably not be of super big interest to you @roland-ewald I think this might be important for you to know A good example is LTBP4: |
@holtgrewe Thank you for the heads-up, I'm aware of the issue. BTW besides the |
ohhhh! thanks for this info |
Sure, you're welcome! |
After looking into this some more, I think the correct way is to use the alignments from the RefSeq file that are stored separately from the exons themselves. It is important to both capture indels in the local alignments themselves (the For example, |
RefSeq transcripts can align with indels and mismatches to the reference sequence. While mismatches could be argued to be non-critical (assuming the GenBank entries that the RefSeq transcript is based on is from healthy individuals), indels cannot.
For hg19, 884 transcripts in 501 genes are affected.
The following solution will be implemented:
default_sources.ini
file gets a settings "fixIndels" and "fixIndelsUcsc".Note
attribute is analyzed.If it contains the substrings
"indel"
or"substitution"
then this is recorded into the builtTranscriptModel
.fixIndels=true
is given then the user also has to provide the path to the reference sequence.fixIndelsUcsc
is used for providing the UCSC transcript alignments.This will be used for the exon and CDS information.
The sequence will be taken from the reference.
NB: This will create an incompatibility between the databases built before and after Jannovar v0.29.
For each
hg*/refseq*
entry, a_fixindel
variant is added that contains these fix transcripts. This way, the fixed transcripts are strictly opt-in and only supplement those where the indel is not fixed. Variants for both can be reported.The text was updated successfully, but these errors were encountered: