-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison of rMATs results and BAM visualization in IGV #183
Comments
The SE.MATS.JC.txt output you posted has a lot of events in that region and it looks like the events share a lot of the same coordinates. Some pairs of events only differ by a few nt. Differences that small are hard to see in the IGV screenshot you posted |
Thanks! Here is the results of rMATS in my project: https://github.com/huangwb8/test_file/tree/master/rmats/DM.vs.DB I'm looking for some obviously different events like this:
Any sugguestions? |
Filtering on the PValue or FDR column is a good start. You could also try filtering on the count columns (IJC_SAMPLE_1, SJC_SAMPLE_1, IJC_SAMPLE_2, SJC_SAMPLE_2) and the inclusion level columns (IncLevel1, IncLevel2, IncLevelDifference) This paper (https://doi.org/10.1016/j.ajhg.2020.06.002) uses these filters which you could use as a starting point:
To get events similar to the plots you posted you may want a filter like: abs(IncLevelDifference) > 0.2 to filter for events with a large change |
@EricKutschera Thanks for your nice advices! Please do not close this issue, and I would update my test in the future. |
@EricKutschera Hello! Here I give an example, which should be a very significant event according to your advices:
When I try to visualize it in IGV with with with with Is it something wrong? |
I looked at the SE.MATS.JC.txt file from https://github.com/huangwb8/test_file/tree/master/rmats/DM.vs.DB and looked at events for ENSG00000135486. I couldn't find the exact events from your last post, but there are a lot of events for that gene where the individual replicates have inclusion and/or skipping counts > 10,000. I'm not sure exactly what the y-axis value means in the IGV screenshot, but that is also > 10,000. Since the events you posted only have 10s or 100s of junction reads compared to the other nearby events with 10,000s of junction reads, the events are not noticeable in IGV. It's not clear if those are real events or just noise. You could try filtering for events with higher junction counts |
@EricKutschera library(dplyr)
x <- read.table('SE.MATS.JC.txt',sep = '\t',header = T)
# function: Average total read count
aver_total_rc <- function(vt){
# vt='9142,7523,7819'
return(mean(as.numeric(unlist(strsplit(vt,split = ',')))))
}
# Average total read count
x$atrc_1 <- apply(x,1,function(v)aver_total_rc(v["IJC_SAMPLE_1"])) + apply(x,1,function(v)aver_total_rc(v["SJC_SAMPLE_1"]))
x$atrc_2 <- apply(x,1,function(v)aver_total_rc(v["IJC_SAMPLE_2"])) + apply(x,1,function(v)aver_total_rc(v["SJC_SAMPLE_2"]))
# Difference of read counts
x$diff_IJC <- apply(x,1,function(v)aver_total_rc(v["IJC_SAMPLE_1"])) - apply(x,1,function(v)aver_total_rc(v["IJC_SAMPLE_2"]))
x$diff_SJC <- apply(x,1,function(v)aver_total_rc(v["SJC_SAMPLE_1"])) - apply(x,1,function(v)aver_total_rc(v["SJC_SAMPLE_2"]))
# Filter
x2 <- dplyr::filter(
x,
FDR < 0.1,
IncLevel1>0.05 & IncLevel1 < 0.95,
IncLevel2>0.05 & IncLevel2 < 0.95,
atrc_1 >=10 & atrc_2 >=10,
# abs(IncLevelDifference)>0.05,
abs(diff_IJC)>=10000|abs(diff_SJC)>=10000,
!chr %in% c('chrX', 'chrY')
) %>% arrange(desc(abs(IncLevelDifference)))
|
IncLevelDifference is numeric_from_comma_string <- function(comma_string) {
return(as.numeric(unlist(strsplit(comma_string, split=','))))
}
min_psi_from_row <- function(row) {
sample_1_psi_values <- numeric_from_comma_string(row["IncLevel1"])
sample_2_psi_values <- numeric_from_comma_string(row["IncLevel2"])
min_sample_1 <- min(sample_1_psi_values)
min_sample_2 <- min(sample_2_psi_values)
return(min(min_sample_1, min_sample_2))
}
max_psi_from_row <- function(row) {
sample_1_psi_values <- numeric_from_comma_string(row["IncLevel1"])
sample_2_psi_values <- numeric_from_comma_string(row["IncLevel2"])
max_sample_1 <- max(sample_1_psi_values)
max_sample_2 <- max(sample_2_psi_values)
return(max(max_sample_1, max_sample_2))
}
x$min_psi <- apply(x, 1, min_psi_from_row)
x$max_psi <- apply(x, 1, max_psi_from_row)
x$psi_diff <- x$max_psi - x$min_psi |
@EricKutschera Your code could run~Thanks! I finally found the root reason of this annoying situation. I'm new of IGV and I use it in a wrong way to my destination. Here I give an example: Before After I think the latter is what I want. Exactly, it's still quite difficult to find out appearant events like : Maybe there is no such obvious events between groups. Or I should try more strick conditons of the rMATs results. |
I want to validate rMATs results in IGV with BAM files. Here is an example of
SE.MATS.JC.txt
:It seem that some SE events of the
LUC7L3
gene differ from samples. Next, I visualized17:50,743,291-50,753,867
in IGV. Here is the plot:It appears that there are no difference between these two samples.
My
rMATS
code is just like:Information of genome & gtf: ENSEMBL; Homo_sapiens.GRCh38; release104
Any suggestions?
The text was updated successfully, but these errors were encountered: