Recommended filtering of rMATS results #440

ankebusch · 2024-10-17T18:51:33Z

Hi,

I'm trying to understand which filters I could or should apply to the rMATS results. Reading your recent paper in Nature Protocols and previously discussed issues (e.g. #320, #183), I understand that I should filter based on FDR, abs(deltaPSI), average PSIs and average coverage, e.g. as mentioned in your paper:

average read count >=10 in both sample groups
filter out events with average PSI value <0.05 or >0.95 in both sample groups
FDR <= 0.01
abs(deltaPSI) >= 0.05

In order to find a good and maybe universal set of filters, I have the following questions:

Is the statistical model rMATS applies considering the uncertainty coming from low coverage events when calculating p-values?
Should the coverage filter (1. above) be chosen depending on the experiment or kept at 10 for most datasets?
Filter 2. above as currently implemented in rmats_filtering.py

...
and min(x.averagePsiSample1, x.averagePsiSample2) <= maxPSI
and max(x.averagePsiSample1, x.averagePsiSample2) >= minPSI
...

is making sure that the average PSIs of the two groups are not both <0.05 or both >0.95. Is filter 2. really needed when filter 4. is applied? Would you instead recommend to filter as suggested in #183 by 0.05 <= average PSI of all samples in the comparison <= 0.95?

Thanks a lot for your help and best,
Anke.

The text was updated successfully, but these errors were encountered:

EricKutschera · 2024-10-18T19:46:07Z

Yes, the coverage affects the p-value calculation. Here's some output from the statistical model showing the p-value changing as the coverage changes (the PSI value stays the same):

ID	IJC_SAMPLE_1	SJC_SAMPLE_1	IJC_SAMPLE_2	SJC_SAMPLE_2	IncFormLen	SkipFormLen	PValue
0	1,1,1	2,2,2	1,1,1	1,1,1	150	100	0.550549161109
1	10,10,10	20,20,20	10,10,10	10,10,10	150	100	0.045763908041
2	100,100,100	200,200,200	100,100,100	100,100,100	150	100	2.77461456366e-07

I think the coverage filter of 10 reads is reasonable for most datasets. It should avoid the issue that this post shows where the p-value can change a lot for a small change in read counts when the total read count for that event is low: https://groups.google.com/g/rmats-user-group/c/2PJ6DWFu1m8/m/0J0eY3XlAAAJ

I agree that filter 2 doesn't remove anything that wouldn't already be removed by filter 4. I think it would be best to just remove filter 2. The filter 0.05 <= average PSI of all samples in the comparison <= 0.95 is based on a situation where there were many samples but the samples weren't divided into two groups. If you have two groups then using a filter on deltaPSI seems good enough

ankebusch · 2024-10-21T17:51:07Z

Hi Eric,

Thanks a lot for your explanations.

Best,
Anke.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommended filtering of rMATS results #440

Recommended filtering of rMATS results #440

ankebusch commented Oct 17, 2024

EricKutschera commented Oct 18, 2024

ankebusch commented Oct 21, 2024

Recommended filtering of rMATS results #440

Recommended filtering of rMATS results #440

Comments

ankebusch commented Oct 17, 2024

EricKutschera commented Oct 18, 2024

ankebusch commented Oct 21, 2024