Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpliceAI plugin is not giving the most "severe" result among duplicates #638

Closed
lacek opened this issue Sep 29, 2023 · 5 comments
Closed
Assignees

Comments

@lacek
Copy link

lacek commented Sep 29, 2023

There are duplicate records in SpliceAI v1.3 (same variant and gene symbol, but different scores), e.g. from tabix spliceai_scores.masked.snv.hg38.vcf.gz 2:241813895-241813895 19:39885875-39885875 we have:

...
2	241813895	.	A	T	.	.	SpliceAI=T|NEU4|0.00|0.00|0.11|0.00|31|0|-2|33
2	241813895	.	A	T	.	.	SpliceAI=T|NEU4|0.00|0.00|0.70|0.00|-28|0|-2|33
...
19	39885875	.	G	C	.	.	SpliceAI=C|FCGBP|0.18|0.94|0.00|0.00|25|-3|25|-23
19	39885875	.	G	C	.	.	SpliceAI=C|FCGBP|0.25|0.00|0.00|0.00|25|-3|25|-23
...

The following is the results of VEP web for the above 2 variants:

Location	Allele	SYMBOL	Feature	SpliceAI_pred_DP_AG	SpliceAI_pred_DP_AL	SpliceAI_pred_DP_DG	SpliceAI_pred_DP_DL	SpliceAI_pred_DS_AG	SpliceAI_pred_DS_AL	SpliceAI_pred_DS_DG	SpliceAI_pred_DS_DL	SpliceAI_pred_SYMBOL
2:241813895-241813895	T	-	ENSR00001047391	-	-	-	-	-	-	-	-	-
2:241813895-241813895	T	-	ENST00000413820.1	-	-	-	-	-	-	-	-	-
2:241813895-241813895	T	-	ENST00000420272.2	-	-	-	-	-	-	-	-	-
2:241813895-241813895	T	-	ENST00000439270.1	-	-	-	-	-	-	-	-	-
2:241813895-241813895	T	LOC124905349	XR_007088398.1	-	-	-	-	-	-	-	-	-
2:241813895-241813895	T	NEU4	ENST00000325935.10	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000391969.6	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000404257.5	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000405370.5	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000407683.6	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000415936.5	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000420288.1	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000423583.5	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000426032.5	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000428592.1	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000435855.5	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000435894.5	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000435934.1	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000476542.5	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000488997.1	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000494678.1	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	ENST00000618597.1	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	NM_001167599.3	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	NM_001167600.3	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	NM_001167601.3	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	NM_001167602.3	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
2:241813895-241813895	T	NEU4	NM_080741.4	-28	0	-2	33	0.00	0.00	0.70	0.00	NEU4
19:39885875-39885875	C	FCGBP	ENST00000595713.1	25	-3	25	-23	0.25	0.00	0.00	0.00	FCGBP
19:39885875-39885875	C	FCGBP	ENST00000616721.6	25	-3	25	-23	0.25	0.00	0.00	0.00	FCGBP
19:39885875-39885875	C	FCGBP	NM_003890.2	25	-3	25	-23	0.25	0.00	0.00	0.00	FCGBP

In short, VEP gives

  • 0.70 for 2-241813895-A-T
  • 0.25 for 19-39885875-G-C

For 19-39885875-G-C, I believe it is because the record for 0.25 comes after that for 0.94, and the current implementation of the plugin loop over all records of matching the variant and gene symbol. Thus the last matched one wins.

In terms of sensitivity, one would probably want the one with max SpliceAI score (more pathogenic prediction) instead, i.e.

  • 0.70 for 2-241813895-A-T
  • 0.94 for 19-39885875-G-C
@dglemos dglemos self-assigned this Sep 29, 2023
@dglemos
Copy link
Contributor

dglemos commented Sep 29, 2023

Hi @lacek,
Thank you for reporting this issue.
The file is not supposed to have more than one score for each variant this is why the plugin is not handling these scores very well.
I think it's better to contact SpliceAI's author to understand why there are multiple scores.

@dglemos
Copy link
Contributor

dglemos commented Oct 11, 2023

As an alternative you could download the Ensembl scores calculated for the MANE select transcripts.
The files are available here: http://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_plugins/

@lacek
Copy link
Author

lacek commented Oct 16, 2023

@dglemos In the file http://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_plugins/spliceai_scores.masked.snv.ensembl_mane.grch38.110.vcf.gz, there are also duplicate lines, e.g.

16	1534632	.	T	A	.	.	SpliceAI=A|TMEM204|0.00|0.00|0.34|0.00|-45|7|1|9,A|IFT140|0.00|0.00|0.00|0.00|-35|-2|-35|-1
16	1534632	.	T	A	.	.	SpliceAI=A|TMEM204|0.00|0.00|0.33|0.00|-45|7|1|9,A|IFT140|0.00|0.00|0.00|0.00|-35|-2|-35|-1
16	1534632	.	T	C	.	.	SpliceAI=C|TMEM204|0.00|0.00|0.00|0.00|-45|13|1|9,C|IFT140|0.00|0.00|0.00|0.00|49|-2|-35|-1
16	1534632	.	T	C	.	.	SpliceAI=C|TMEM204|0.00|0.00|0.00|0.00|-45|13|1|9,C|IFT140|0.00|0.00|0.00|0.00|49|-2|-35|-1
16	1534632	.	T	G	.	.	SpliceAI=G|TMEM204|0.00|0.00|0.00|0.00|7|-45|-15|1,G|IFT140|0.00|0.00|0.00|0.00|-35|-2|-35|-1
16	1534632	.	T	G	.	.	SpliceAI=G|TMEM204|0.00|0.00|0.00|0.00|7|-45|-15|1,G|IFT140|0.00|0.00|0.00|0.00|-35|-2|-35|-1

However, the differences of scores among the duplicates appear to be within 0.01 and therefore it shouldn't affect interpretation.

I will check with my team to see if this file fits our use case. Thank you for the advice.

@dglemos
Copy link
Contributor

dglemos commented Oct 27, 2023

Thanks for letting us know! There was a problem with the masked file.
In the next release, we are going to release the fixed version of the file.

@dglemos
Copy link
Contributor

dglemos commented Dec 20, 2023

I'm going to close this ticket but feel free to open a new one if you have any other questions.

Best wishes,
Diana

@dglemos dglemos closed this as completed Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants