Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lnclevel and sample clinical information matched #433

Open
XXuxi opened this issue Sep 7, 2024 · 4 comments
Open

lnclevel and sample clinical information matched #433

XXuxi opened this issue Sep 7, 2024 · 4 comments

Comments

@XXuxi
Copy link

XXuxi commented Sep 7, 2024

hi rmats time,
After inputting a large number of samples, I have correctly obtained the lnclevel of the clipping event, and I want to consult how to match the value in lnclevel with the sample information. I guess that the sample order of -b1 and -b2 for calculating the input of rmats is the sample order represented by the output value of lnclevel. I want to get your confirmation.

@XXuxi
Copy link
Author

XXuxi commented Sep 7, 2024

Another question is whether the splicing events generated in the results based on the GTF file are the lists obtained in different transcripts with or without that exon or intron in the same gene. I wanted to investigate the differences between transcripts in whether this splicing event occurred or not for analysis. Looking forward to your reply.

@EricKutschera
Copy link
Contributor

Yes the values in IncLevel1 are for the samples in the same order as --b1. Similarly for IncLevel2 and --b2: #263

I'm not sure exactly what your second question is. This post has some details about how events are detected using the GTF and reads: #161 (comment)

@XXuxi
Copy link
Author

XXuxi commented Sep 9, 2024

I apologize for not describing the problem clearly. In fact, when observing *.mats.JC, there is a problem of mismatch between exon sites and GTF.
截屏2024-09-09 23 05 31
截屏2024-09-09 23 06 04
As shown in the above images, each exon start (ES) site is 1 value larger than that shown by gtf. I found corresponding patterns in other splicing events.

Actually, I used gtf mainly to find out what transcripts each splicing event might occur on, and the correspondence is provided in the gtf file. So I hope you can answer my above doubts. Or do you have a better way of identifying what transcripts the splicing event might be on, and I really need your help.

Finally, I tried to create a new id for each splicing event, like ENSG00000225190.12; SE:chr17:45475726-45475888:45476073-45477900:-. For now I plan to use past0() to create. Perhaps you can provide us with a better way to generate a characteristic id for the rmats results?

@EricKutschera
Copy link
Contributor

The start coordinates in the rmats output files are zero-based and the end coordinates are one-based (like BED file format): #316 (comment)

Looking for transcripts that use the event coordinates in the gtf file makes sense. There is an example script in this post #148

Making an event ID like you did by joining the event coordinates is reasonable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants