-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gene not included in rstat analysis #17
Comments
Short answer: That looks like a retained intron event which rmats will only report if it is (mostly) annotated in the gtf file. rmats is not designed to detect unannotated RI events. If rmats is not detecting the event, you can assist rmats by adding transcripts for the two isoforms to the gtf. You might be able to get rmats to detect this event by using --novelSS, but that flag is intended for adjusting one side of an annotated junction rather than inserting or removing a splice junction More details: rmats requires these definitions to detect RI events
If those three exons are defined for this gene in the gtf and there is a transcript for the gene in the gtf that includes that junction then the event will only be in fromGTF.RI.txt rmats does have the ability to detect some unannotated (novel) events by combining information from the gtf with reads from the BAMs If those three exons are defined, but there is no transcript with the junction, then rmats can detect the junction if there is a read to support it. In that case the event includes a "novelJunction" and the event will be in both fromGTF.novelJunction.RI.txt and fromGTF.RI.txt If rmats is run with --novelSS then rmats will define novel exons if there is a read which includes a junction that only has one end of the junction matching an exon defined in the gtf. Essentially rmats will define a novel exon by adjusting one side of an exon defined in the gtf. If the event required --novelSS in order to be detected then the event will be in both fromGTF.novelSpliceSite.RI.txt and fromGTF.RI.txt. |
Thanks for your answer. I digged a little bit deeper in the gtf file and I think I have found the problem, however I am still curious whether rmats should solve this or that the gtf file needs to be adjusted. The exons that are defined are:
However, the 5'UTR is part of exon 1 and as you can see in the graphic is that exon 1 (in tran 1) extends slightly further than exon 1 (of tran 2). So the problem seems to be that the UTR is part of the defined exon. Is that supposed to be like that as UTRs are of course non-coding. |
You are right that rmats will not detect the event because the start of exon 1 is different in the two transcripts. rmats can only detect "simple" intron retention events. In this case it is a "complex" event because there are two changes for the exons that define the event (different start coordinate, and whether the intron is spliced out). You could update the gtf so that the two transcripts have the same start coordinate and then rmats will detect the event. In your case, it sounds like removing the 5'UTR from the exon definitions in the gtf will cause the two transcripts to have the same start coordinate. I don't think handling differences in the UTR is something that rmats should do automatically, but maybe there is a tool that exists for removing the UTRs from exons in a gtf. Users could choose to use before running rmats |
Below is an example of a transcript structure that shows differential exon usage between WT and geneX using DexSeq (C. elegans data). I expected this gene/transcripts also to be part of the rmats output. However it does not show up. Upon closer inspection it already does not seem to be part of the fromGTF[rest].txt files? Is that expected behaviour?
The text was updated successfully, but these errors were encountered: