Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read counts inconsistency between rMATs results and rmats2sashimiplot #33

Closed
thereallda opened this issue Jun 10, 2020 · 2 comments
Closed

Comments

@thereallda
Copy link

Hi rMATS team,
Recently, I have been figuring out the results in rMATs and rmats2sashimiplot. The rMATs is great and the rmats2sashimiplot creats the plots smoothly, but the read counts shown in sashimiplot confuses me. Here is a representative sashimiplot of a single gene in control group and sh1 group
image

In Shank3 control-1, the juction read counts shown in sashimiplot are 28 and 33, and the skipped read count is 2.

While I found the read counts in rMATS are:

Control group:
junction reads
1: 42;
2: 48;
3: 50;
4: 68
skipped reads
1: 2 ;
2: 14 ;
3: 2 ;
4: 0

The same issues happened in other genes, thus I wonder how does sashimiplot quantifying the reads and how can I make two results consistent or perhaps we can find a relationship between them.

Any help would be appreciated
Thanks!
thereallda

@EricKutschera
Copy link
Contributor

rmats2sashimiplot and rMATS filter reads differently

rmats2sashimiplot only filters out reads that include an insertion or deletion in the cigar: https://github.com/Xinglab/rmats2sashimiplot/blob/master/src/MISO/misopy/sashimi_plot/plot_utils/plot_gene.py#L549

rMATS requires reads to:


rmats2sashimiplot is just counting individual junctions. If a read spans multiple junctions then rmats2sashimiplot will count that read toward each junction.

rMATS is classifying each read as supporting the inclusion isoform, supporting the skipping isoform, or neither. Even if a read spans multiple junctions rMATS will still only count it as a single supporting read. https://github.com/Xinglab/rmats-turbo/blob/master/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L1657

rMATS can also count exon reads (reads that do not span a junction) as supporting one of the isoforms: https://github.com/Xinglab/rmats-turbo/blob/master/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L1681

Here is a visualization of which reads rMATS counts as supporting which isoforms https://github.com/Xinglab/rmats-turbo#output


In your example the plot of Shank3 control-2 is interesting to discuss because it differs from rMATS for both inclusion and skipping counts. It has inclusion junction counts of 34 and 32 which total 66 and the skipping junction count is 10. The rMATS output shows 48 reads supporting the inclusion isoform and 14 supporting the skipping isoform. It may be that 18 of the 48 reads that rMATS counts toward the inclusion isoform span both junctions (left only: 16, right only: 14, both: 18, total: 48). It also could be that all of the 66 reads only span a single junction but rMATS filtered out 18 reads that rmats2sashimiplot counted (maybe for readLength). If the rMATS counts you posted are from SE.MATS.JCEC.txt then some of the 48 reads supporting the inclusion isoform may have been exon reads. The true explanation may be some combination of those.

I can't think of an explanation for why rMATS counts 14 reads as supporting the skipping isoform and rmats2sashimiplot has only 10 counts of the skipping junction. If you can share your inputs and commands used to run rMATS and rmats2sashimiplot then I can try to reproduce and investigate

@thereallda
Copy link
Author

thereallda commented Jul 17, 2020

Hi, EricKutschera

Thanks for your thorough clarification of how those softwares work.

I subtract the "Shank3" gene region("7:130,474,279-130,534,679") from the bam file and rerun my rMATS and rmats2sashimiplot

## rMATS --version: 4.0.2
$ python /public/publicUse/softwares/rMATS.4.0.2/rMATS-turbo-Linux-UCS4/rmats.py --b1 con-rep.txt --b2 sh1-rep.txt --gtf Rattus_norvegicus.Rnor_6.0.96.chr.gtf --od as1-rep -t paired --readLength 150 --cstat 0.0001 --libType fr-unstranded --nthread 16

$ rmats2sashimiplot --b1 shank3-con-1.bam,shank3-con-2.bam,shank3-con-3.bam,shank3-con-4.bam --b2 shank3-sh1-1.bam,shank3-sh1-2.bam,shank3-sh1-3.bam,shank3-sh1-4.bam -t SE -e se.txt --l1 control --l2 sh1 --exon_s 1 --intron_s 2 -o as1-rep/as1_plot

con-rep.txt :
shank3-con-1.bam,shank3-con-2.bam,shank3-con-3.bam,shank3-con-4.bam

sh1-rep.txt :
shank3-sh1-1.bam,shank3-sh1-2.bam,shank3-sh1-3.bam,shank3-sh1-4.bam

se.txt(from SE.MATS.JCEC.txt) :

 ID      GeneID  geneSymbol      chr     strand  exonStart_0base exonEnd upstreamES      upstreamEE      downstreamES    downstreamEE ID      IJC_SAMPLE_1    SJC_SAMPLE_1    IJC_SAMPLE_2    SJC_SAMPLE_2    IncFormLen      SkipFormLen     PValue  FDR     IncLevel1   IncLevel2        IncLevelDifference
2       "ENSRNOG00000052296"    "Shank3"        7    +       130518293       130518378       130517614       130517745       130522029    130524289       2       42,48,50,68     2,14,2,0        37,12,90,44     0,0,0,2 234     149     2.44140310615e-05       9.7656124246e-05     0.93,0.686,0.941,1.0    1.0,1.0,1.0,0.933       -0.094

image

sample junction skip
control-1 42 2
control-2 48 14
control-3 50 2
control-4 68 0
sh1-1 37 0
sh1-2 12 0
sh1-3 90 0
sh1-4 44 2

This time the rMATS output and rmats2sashimiplot match as you said, for example the skipping isoforms reads are the same between rmats output and rmats2sashimiplot in all samples. I may confuse the input bam files in my previous commands, but I will fix it anyway. Also, I attached my shank3.zip data, and you can test it if you want.

shank3.zip

Again, thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants