-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SpliceAI plugin is not giving the most "severe" result among duplicates #638
Comments
Hi @lacek, |
As an alternative you could download the Ensembl scores calculated for the MANE select transcripts. |
@dglemos In the file http://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_plugins/spliceai_scores.masked.snv.ensembl_mane.grch38.110.vcf.gz, there are also duplicate lines, e.g. 16 1534632 . T A . . SpliceAI=A|TMEM204|0.00|0.00|0.34|0.00|-45|7|1|9,A|IFT140|0.00|0.00|0.00|0.00|-35|-2|-35|-1
16 1534632 . T A . . SpliceAI=A|TMEM204|0.00|0.00|0.33|0.00|-45|7|1|9,A|IFT140|0.00|0.00|0.00|0.00|-35|-2|-35|-1
16 1534632 . T C . . SpliceAI=C|TMEM204|0.00|0.00|0.00|0.00|-45|13|1|9,C|IFT140|0.00|0.00|0.00|0.00|49|-2|-35|-1
16 1534632 . T C . . SpliceAI=C|TMEM204|0.00|0.00|0.00|0.00|-45|13|1|9,C|IFT140|0.00|0.00|0.00|0.00|49|-2|-35|-1
16 1534632 . T G . . SpliceAI=G|TMEM204|0.00|0.00|0.00|0.00|7|-45|-15|1,G|IFT140|0.00|0.00|0.00|0.00|-35|-2|-35|-1
16 1534632 . T G . . SpliceAI=G|TMEM204|0.00|0.00|0.00|0.00|7|-45|-15|1,G|IFT140|0.00|0.00|0.00|0.00|-35|-2|-35|-1 However, the differences of scores among the duplicates appear to be within 0.01 and therefore it shouldn't affect interpretation. I will check with my team to see if this file fits our use case. Thank you for the advice. |
Thanks for letting us know! There was a problem with the masked file. |
I'm going to close this ticket but feel free to open a new one if you have any other questions. Best wishes, |
There are duplicate records in SpliceAI v1.3 (same variant and gene symbol, but different scores), e.g. from
tabix spliceai_scores.masked.snv.hg38.vcf.gz 2:241813895-241813895 19:39885875-39885875
we have:The following is the results of VEP web for the above 2 variants:
In short, VEP gives
For 19-39885875-G-C, I believe it is because the record for 0.25 comes after that for 0.94, and the current implementation of the plugin loop over all records of matching the variant and gene symbol. Thus the last matched one wins.
In terms of sensitivity, one would probably want the one with max SpliceAI score (more pathogenic prediction) instead, i.e.
The text was updated successfully, but these errors were encountered: