To identify differential alternative splicing between two groups #320

jingxian555 · 2023-09-10T07:32:00Z

Thanks for your tools! There are 2 sample groups with 3 BAM files per group，but I am counfused with the results, how to filter for differential alternative splicing between the experimental group and the control group? I'm looking at this result file [AS_Event].MATS.JC.txt, can I directly filter differential alternative splicing based on FDR? Thanks!

The executed script is as follows:
python /home/sunpf/my_data/software/miniconda3/envs/new_rmats/rMATS/rmats.py --b1 u8.txt --b2 WT.txt --gtf Arabidopsis_thaliana.TAIR10.57.gtf -t paired --readLength 150 --novelSS --od out --tmp tmp

EricKutschera · 2023-09-11T14:03:40Z

Yes, you can filter on FDR to find significant differential alternative splicing events between the two groups. This post mentions some potential filters for other columns in the output #183 (comment)

jingxian555 · 2023-09-15T06:36:52Z

Thank you for your suggestions. Here is the code of python for advanced filtering, Is my understanding correct? And are these filtering criteria necessary? After filtering based on these criteria, there are only 30 significant differential alternative splicing events remaining. Can I just filter based on FDR < 0.05 to find significant differential alternative splicing events between the two groups? Thanks!

fout = open('significant.SE.MATS.JC.txt', 'w')
with open('SE.MATS.JC.txt','r') as fin:
    line = fin.readline()
    for line in fin:
        line = line.strip()
        arr = line.split('\t')
        IJC_SAMPLE_1 = arr[12]
        SJC_SAMPLE_1 = arr[13]
        IJC_SAMPLE_2 = arr[14]
        SJC_SAMPLE_2 = arr[15]
        FDR = arr[-4]
        IncLevel1 = arr[-3]
        IncLevel2 = arr[-2]

        IJC_SAMPLE_1_l = [int(IJC1) for IJC1 in IJC_SAMPLE_1.split(',')]
        IJC_SAMPLE_1_avg = sum(IJC_SAMPLE_1_l)/len(IJC_SAMPLE_1_l)

        SJC_SAMPLE_1_l = [int(SJC1) for SJC1 in SJC_SAMPLE_1.split(',')]
        SJC_SAMPLE_1_avg = sum(SJC_SAMPLE_1_l)/len(SJC_SAMPLE_1_l)

        IJC_SAMPLE_2_l = [int(IJC2) for IJC2 in IJC_SAMPLE_2.split(',')]
        IJC_SAMPLE_2_avg = sum(IJC_SAMPLE_2_l)/len(IJC_SAMPLE_2_l)

        SJC_SAMPLE_2_l = [int(SJC2) for SJC2 in SJC_SAMPLE_2.split(',')]
        SJC_SAMPLE_2_avg = sum(SJC_SAMPLE_2_l)/len(SJC_SAMPLE_2_l)

        IncLevel1_l = [float(I1) for I1 in IncLevel1.split(',') if I1 != 'NA']
        IncLevel1_avg = sum(IncLevel1_l)/len(IncLevel1_l)
        IncLevel1_range = max(IncLevel1_l) - min(IncLevel1_l)

        IncLevel2_l = [float(I2) for I2 in IncLevel2.split(',') if I2 != 'NA']
        IncLevel2_avg = sum(IncLevel2_l)/len(IncLevel2_l)
        IncLevel2_range = max(IncLevel2_l) - min(IncLevel2_l)

        if (IJC_SAMPLE_1_avg + SJC_SAMPLE_1_avg >= 10) and (IJC_SAMPLE_2_avg + SJC_SAMPLE_2_avg >= 10) and (float(FDR) < 0.05) and (IncLevel1_range > 0.05) and (0.05 < IncLevel1_avg < 0.95) and (IncLevel2_range > 0.05) and (0.05 < IncLevel2_avg < 0.95):
            fout.write(line + '\n')

fout.close()

EricKutschera · 2023-09-18T14:52:42Z

Yes, you can filter just based on FDR

For the code, I think the checks for the inclusion levels mentioned in that other post were intended to be done with all samples. Instead of requiring the PSI value to differ within a sample group like IncLevel1_range > 0.05, I think the original intention was to check that the PSI value had some variation when looking at all samples

jingxian555 · 2023-10-08T13:28:55Z

I apologize, but I'm having difficulty understanding your request. Can you just change the code? Thank you so much！

jingxian555 · 2023-10-09T12:38:55Z

I saw in the article that evaluate splicing defects by comparing the percentage spliced-in (PSI) index of AS events (P < 0.05 and delta PSI > 3%) , Is delta PSI the same as abs(IncLevelDifference)?Thank you~

EricKutschera · 2023-10-09T14:51:41Z

I'm not sure what article you're referring to, but IncLevelDifference is a difference in PSI values. Taking the absolute value of IncLevelDifference and calling that delta PSI is reasonable

For the code change

fout = open('significant.SE.MATS.JC.txt', 'w')
with open('SE.MATS.JC.txt','r') as fin:
    line = fin.readline()
    for line in fin:
        line = line.strip()
        arr = line.split('\t')
        IJC_SAMPLE_1 = arr[12]
        SJC_SAMPLE_1 = arr[13]
        IJC_SAMPLE_2 = arr[14]
        SJC_SAMPLE_2 = arr[15]
        FDR = arr[-4]
        IncLevel1 = arr[-3]
        IncLevel2 = arr[-2]

        IJC_SAMPLE_1_l = [int(IJC1) for IJC1 in IJC_SAMPLE_1.split(',')]
        IJC_SAMPLE_1_avg = sum(IJC_SAMPLE_1_l)/len(IJC_SAMPLE_1_l)

        SJC_SAMPLE_1_l = [int(SJC1) for SJC1 in SJC_SAMPLE_1.split(',')]
        SJC_SAMPLE_1_avg = sum(SJC_SAMPLE_1_l)/len(SJC_SAMPLE_1_l)

        IJC_SAMPLE_2_l = [int(IJC2) for IJC2 in IJC_SAMPLE_2.split(',')]
        IJC_SAMPLE_2_avg = sum(IJC_SAMPLE_2_l)/len(IJC_SAMPLE_2_l)

        SJC_SAMPLE_2_l = [int(SJC2) for SJC2 in SJC_SAMPLE_2.split(',')]
        SJC_SAMPLE_2_avg = sum(SJC_SAMPLE_2_l)/len(SJC_SAMPLE_2_l)

        IncLevel1_l = [float(I1) for I1 in IncLevel1.split(',') if I1 != 'NA']
        IncLevel2_l = [float(I2) for I2 in IncLevel2.split(',') if I2 != 'NA']

        all_IncLevel = IncLevel1_l + IncLevel2_l
        IncLevel_avg = sum(all_IncLevel)/len(all_IncLevel)
        IncLevel_range = max(all_IncLevel) - min(all_IncLevel)

        if (IJC_SAMPLE_1_avg + SJC_SAMPLE_1_avg >= 10) and (IJC_SAMPLE_2_avg + SJC_SAMPLE_2_avg >= 10) and (float(FDR) < 0.05) and (IncLevel_range > 0.05) and (0.05 < IncLevel_avg < 0.95):
            fout.write(line + '\n')

fout.close()

jingxian555 · 2023-12-25T12:37:18Z

Thank you for your previous advice; it was very helpful.
Now, I have a new question. Is the '--b1' parameter for the BAM file of the experimental group? For example, in this "SE.MATS.JC.txt" file, does IncLevelDifference > 0 indicate a higher probability of SE events occurring in group b1 compared to b2? Thank you!

The executed script is as follows:
python /home/sunpf/my_data/software/miniconda3/envs/new_rmats/rMATS/rmats.py --b1 u8.txt --b2 WT.txt --gtf Arabidopsis_thaliana.TAIR10.57.gtf -t paired --readLength 150 --novelSS --od out --tmp tmp

EricKutschera · 2023-12-26T20:10:22Z

From the README: https://github.com/Xinglab/rmats-turbo/tree/v4.2.0#output

IncLevelDifference: average(IncLevel1) - average(IncLevel2)

IncLevelDifference > 0 indicates that IncLevel1 > IncLevel2. Since IncLevel1 corresponds to --b1 and IncLevel2 to --b2, that would mean a higher inclusion level in --b1 as compared to --b2. For SE events a higher inclusion level means that the exon is included more often

Talking about events occurring in one group or the other can be confusing as mentioned in this post: Xinglab/rmats2sashimiplot#68 (comment)

jingxian555 · 2024-01-22T08:54:04Z

Thank you. Is rMATS used for alternative splicing analysis with uniquely mapped reads in BAM files?

EricKutschera · 2024-01-22T19:37:14Z

Yes, rMATS only uses uniquely mapped reads. rMATS will write a section to stdout saying how many reads were filtered out for different reasons. In that section NOT_NH_1 means the read was filtered out because it was not uniquely mapped. Here's a related post: #293

jingxian555 · 2024-01-29T02:04:27Z

Thank you！

The gene AT5G50100 has only one transcript with 8 exonic regions. Why is there a discrepancy with the coordinate information in RI.MATS.JC.txt?
exonic regions:

RI.MATS.JC.txt:

Below is the image generated by rmats2sashimiplot, where the numbers on the image represent the read count on the junctions. Can you provide more detailed explanations?

EricKutschera · 2024-01-29T14:12:55Z

The rmats command you posted before included --novelSS. With that option rmats can detect events with splice sites that are not in the --gtf: #277 (comment)

rmats2sashimiplot doesn't show counts for novel junctions: https://github.com/Xinglab/rmats2sashimiplot/blob/v3.0.0/src/MISO/misopy/sashimi_plot/plot_utils/plot_gene.py#L148
This post discusses differences in read counts between rmats and rmats2sashimiplot
Xinglab/rmats2sashimiplot#33 (comment)

jingxian555 · 2024-02-26T14:24:07Z

Thanks! I removed the --novelSS from the rmats command and filtered significant RI using these filters:

Average PSI (IncLevel) within 0.05 and 0.95
Average total read count (inclusion count + skipping count) >= 10
max(PSI) – min(PSI) > 0.05
FDR < 0.05
abs(IncLevelDifference) > 0.2

The significant RI events I found all follow a pattern. For example, in the image below, the red boxes indicate significant RI. However, these significant RI regions belong to exonic regions in another transcript. That is, there are overlapping regions of exons and introns in different transcripts. Is this due to differences in exonic regions causing differences in intron retention?

The executed script is as follows:
python /home/sunpf/my_data/software/miniconda3/envs/new_rmats/rMATS/rmats.py --b1 u8.txt --b2 WT.txt --gtf Arabidopsis_thaliana.TAIR10.57.gtf -t paired --readLength 150 --od out --tmp tmp

EricKutschera · 2024-02-26T19:46:15Z

rMATS relies on the --gtf for RI events. Unless --novelSS is used, all of the RI events reported by rMATS should have the intron region overlap an annotated transcript in an exon region. This post has some details: #17

EricKutschera mentioned this issue Feb 5, 2024

What is the difference between NA and 0 values in splicing quantification? #265

Open

EricKutschera mentioned this issue May 7, 2024

Few questions regarding rMATS #400

Open

ankebusch mentioned this issue Oct 17, 2024

Recommended filtering of rMATS results #440

Open

EricKutschera mentioned this issue Nov 25, 2024

How to filter #456

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To identify differential alternative splicing between two groups #320

To identify differential alternative splicing between two groups #320

jingxian555 commented Sep 10, 2023

EricKutschera commented Sep 11, 2023

jingxian555 commented Sep 15, 2023

EricKutschera commented Sep 18, 2023

jingxian555 commented Oct 8, 2023

jingxian555 commented Oct 9, 2023

EricKutschera commented Oct 9, 2023

jingxian555 commented Dec 25, 2023

EricKutschera commented Dec 26, 2023

jingxian555 commented Jan 22, 2024

EricKutschera commented Jan 22, 2024

jingxian555 commented Jan 29, 2024

EricKutschera commented Jan 29, 2024

jingxian555 commented Feb 26, 2024

EricKutschera commented Feb 26, 2024

To identify differential alternative splicing between two groups #320

To identify differential alternative splicing between two groups #320

Comments

jingxian555 commented Sep 10, 2023

EricKutschera commented Sep 11, 2023

jingxian555 commented Sep 15, 2023

EricKutschera commented Sep 18, 2023

jingxian555 commented Oct 8, 2023

jingxian555 commented Oct 9, 2023

EricKutschera commented Oct 9, 2023

jingxian555 commented Dec 25, 2023

EricKutschera commented Dec 26, 2023

jingxian555 commented Jan 22, 2024

EricKutschera commented Jan 22, 2024

jingxian555 commented Jan 29, 2024

EricKutschera commented Jan 29, 2024

jingxian555 commented Feb 26, 2024

EricKutschera commented Feb 26, 2024