Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the summary.txt is empty? #391

Open
danli349 opened this issue Apr 3, 2024 · 11 comments
Open

Why the summary.txt is empty? #391

danli349 opened this issue Apr 3, 2024 · 11 comments

Comments

@danli349
Copy link

danli349 commented Apr 3, 2024

The run is complete, why the summary.txt is empty?

conda activate ~/rmats_turbo_v4_3_0/conda_envs/rmats

python rmats.py --s1 s1.txt --s2 s2.txt \
--gtf ref/gtf/Mus_musculus.GRCm38.93.gtf \
--bi ref/STAR_mm10 \
-t paired --readLength 50 \
--nthread 24 \
--od output \
--tmp tmp_output

image

Thanks a lot

@EricKutschera
Copy link
Contributor

Can you post what the command printed?
One common cause is NOT_EXPECTED_READ_LENGTH like: #387 (comment)
The command also may have printed warning or error messages

@danli349
Copy link
Author

danli349 commented Apr 4, 2024

@EricKutschera
This is the output of the run:
is it correct?
Thanks

------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time :                                   47263.00 sec.
    Max Memory :                                 42 GB
    Average Memory :                             30.41 GB
    Total Requested Memory :                     144.00 GB
    Delta Memory :                               102.00 GB
    Max Swap :                                   -
    Max Processes :                              9
    Max Threads :                                80
    Run time :                                   4873 sec.
    Turnaround time :                            4874 sec.

The output (if any) follows:

mapping the first sample
mapping sample_0, fastq/7369172_1.fastq.gz fastq/7369172_2.fastq.gz is done with status 0
Apr 04 10:09:16 ..... started STAR run
Apr 04 10:09:16 ..... loading genome
Apr 04 10:09:30 ..... processing annotations GTF
Apr 04 10:09:36 ..... inserting junctions into the genome indices
Apr 04 10:10:40 ..... started 1st pass mapping
Apr 04 10:12:05 ..... finished 1st pass mapping
Apr 04 10:12:05 ..... inserting junctions into the genome indices
Apr 04 10:13:26 ..... started mapping
Apr 04 10:15:01 ..... started sorting BAM
Apr 04 10:15:45 ..... finished successfully
mapping sample_0, \
fastq/7369175_1.fastq.gz fastq/7369175_2.fastq.gz is done with status 0
Apr 04 10:15:46 ..... started STAR run
Apr 04 10:15:46 ..... loading genome
Apr 04 10:15:58 ..... processing annotations GTF
Apr 04 10:16:06 ..... inserting junctions into the genome indices
Apr 04 10:17:10 ..... started 1st pass mapping
Apr 04 10:18:30 ..... finished 1st pass mapping
Apr 04 10:18:30 ..... inserting junctions into the genome indices
Apr 04 10:19:51 ..... started mapping
Apr 04 10:21:21 ..... started sorting BAM
Apr 04 10:22:03 ..... finished successfully
mapping sample_0, \
fastq/7369181_1.fastq.gz fastq/7369181_2.fastq.gz is done with status 0
Apr 04 10:22:04 ..... started STAR run
Apr 04 10:22:04 ..... loading genome
Apr 04 10:22:17 ..... processing annotations GTF
Apr 04 10:22:24 ..... inserting junctions into the genome indices
Apr 04 10:23:27 ..... started 1st pass mapping
Apr 04 10:25:31 ..... finished 1st pass mapping
Apr 04 10:25:32 ..... inserting junctions into the genome indices
Apr 04 10:26:51 ..... started mapping
Apr 04 10:29:18 ..... started sorting BAM
Apr 04 10:29:59 ..... finished successfully
mapping sample_0, \
fastq/7369184_1.fastq.gz fastq/7369184_2.fastq.gz is done with status 0
Apr 04 10:29:59 ..... started STAR run
Apr 04 10:29:59 ..... loading genome
Apr 04 10:30:12 ..... processing annotations GTF
Apr 04 10:30:19 ..... inserting junctions into the genome indices
Apr 04 10:31:22 ..... started 1st pass mapping
Apr 04 10:32:39 ..... finished 1st pass mapping
Apr 04 10:32:39 ..... inserting junctions into the genome indices
Apr 04 10:34:00 ..... started mapping
Apr 04 10:35:27 ..... started sorting BAM
Apr 04 10:36:09 ..... finished successfully
mapping the first sample
mapping sample_1, fastq/7369173_1.fastq.gz fastq/7369173_2.fastq.gz is done with status 0
Apr 04 10:36:09 ..... started STAR run
Apr 04 10:36:09 ..... loading genome
Apr 04 10:36:22 ..... processing annotations GTF
Apr 04 10:36:32 ..... inserting junctions into the genome indices
Apr 04 10:37:37 ..... started 1st pass mapping
Apr 04 10:38:55 ..... finished 1st pass mapping
Apr 04 10:38:55 ..... inserting junctions into the genome indices
Apr 04 10:40:17 ..... started mapping
Apr 04 10:41:44 ..... started sorting BAM
Apr 04 10:42:21 ..... finished successfully
mapping sample_1, \
fastq/7369174_1.fastq.gz fastq/7369174_2.fastq.gz is done with status 0
Apr 04 10:42:21 ..... started STAR run
Apr 04 10:42:21 ..... loading genome
Apr 04 10:42:33 ..... processing annotations GTF
Apr 04 10:42:40 ..... inserting junctions into the genome indices
Apr 04 10:43:45 ..... started 1st pass mapping
Apr 04 10:45:05 ..... finished 1st pass mapping
Apr 04 10:45:05 ..... inserting junctions into the genome indices
Apr 04 10:46:27 ..... started mapping
Apr 04 10:48:07 ..... started sorting BAM
Apr 04 10:48:55 ..... finished successfully
mapping sample_1, \
fastq/7369176_1.fastq.gz fastq/7369176_2.fastq.gz is done with status 0
Apr 04 10:48:55 ..... started STAR run
Apr 04 10:48:55 ..... loading genome
Apr 04 10:49:09 ..... processing annotations GTF
Apr 04 10:49:16 ..... inserting junctions into the genome indices
Apr 04 10:50:21 ..... started 1st pass mapping
Apr 04 10:51:41 ..... finished 1st pass mapping
Apr 04 10:51:42 ..... inserting junctions into the genome indices
Apr 04 10:53:03 ..... started mapping
Apr 04 10:54:31 ..... started sorting BAM
Apr 04 10:55:10 ..... finished successfully
mapping sample_1, \
fastq/7369177_1.fastq.gz fastq/7369177_2.fastq.gz is done with status 0
Apr 04 10:55:10 ..... started STAR run
Apr 04 10:55:10 ..... loading genome
Apr 04 10:55:23 ..... processing annotations GTF
Apr 04 10:55:30 ..... inserting junctions into the genome indices
Apr 04 10:56:35 ..... started 1st pass mapping
Apr 04 10:57:46 ..... finished 1st pass mapping
Apr 04 10:57:47 ..... inserting junctions into the genome indices
Apr 04 10:59:09 ..... started mapping
Apr 04 11:00:28 ..... started sorting BAM
Apr 04 11:01:06 ..... finished successfully
mapping sample_1, \
fastq/7369182_1.fastq.gz fastq/7369182_2.fastq.gz is done with status 0
Apr 04 11:01:06 ..... started STAR run
Apr 04 11:01:06 ..... loading genome
Apr 04 11:01:18 ..... processing annotations GTF
Apr 04 11:01:25 ..... inserting junctions into the genome indices
Apr 04 11:02:30 ..... started 1st pass mapping
Apr 04 11:04:16 ..... finished 1st pass mapping
Apr 04 11:04:16 ..... inserting junctions into the genome indices
Apr 04 11:05:38 ..... started mapping
Apr 04 11:07:28 ..... started sorting BAM
Apr 04 11:08:11 ..... finished successfully
mapping sample_1, \
fastq/7369183_1.fastq.gz fastq/7369183_2.fastq.gz is done with status 0
Apr 04 11:08:11 ..... started STAR run
Apr 04 11:08:11 ..... loading genome
Apr 04 11:08:23 ..... processing annotations GTF
Apr 04 11:08:30 ..... inserting junctions into the genome indices
Apr 04 11:09:34 ..... started 1st pass mapping
Apr 04 11:10:56 ..... finished 1st pass mapping
Apr 04 11:10:57 ..... inserting junctions into the genome indices
Apr 04 11:12:15 ..... started mapping
Apr 04 11:13:37 ..... started sorting BAM
Apr 04 11:14:20 ..... finished successfully
mapping sample_1, \
fastq/7369185_1.fastq.gz fastq/7369185_2.fastq.gz is done with status 0
Apr 04 11:14:20 ..... started STAR run
Apr 04 11:14:20 ..... loading genome
Apr 04 11:14:33 ..... processing annotations GTF
Apr 04 11:14:40 ..... inserting junctions into the genome indices
Apr 04 11:15:42 ..... started 1st pass mapping
Apr 04 11:16:52 ..... finished 1st pass mapping
Apr 04 11:16:52 ..... inserting junctions into the genome indices
Apr 04 11:18:09 ..... started mapping
Apr 04 11:19:36 ..... started sorting BAM
Apr 04 11:20:21 ..... finished successfully
mapping sample_1, \
fastq/7369186_1.fastq.gz /fastq/7369186_2.fastq.gz is done with status 0
Apr 04 11:20:21 ..... started STAR run
Apr 04 11:20:21 ..... loading genome
Apr 04 11:20:34 ..... processing annotations GTF
Apr 04 11:20:42 ..... inserting junctions into the genome indices
Apr 04 11:21:47 ..... started 1st pass mapping
Apr 04 11:23:01 ..... finished 1st pass mapping
Apr 04 11:23:01 ..... inserting junctions into the genome indices
Apr 04 11:24:23 ..... started mapping
Apr 04 11:25:41 ..... started sorting BAM
Apr 04 11:26:18 ..... finished successfully
gtf: 17.466161966323853
There are 31053 distinct gene ID in the gtf file
There are 111079 distinct transcript ID in the gtf file
There are 11919 one-transcript genes in the gtf file
There are 767088 exons in the gtf file
There are 6048 one-exon transcripts in the gtf file
There are 1946 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 3.577078
Average number of exons per transcript is 6.905788
Average number of exons per transcript excluding one-exon tx is 7.245861
Average number of gene per geneGroup is 6.911025
statistic: 0.016553878784179688

read outcome totals across all BAMs
USED: 0
NOT_PAIRED: 0
NOT_NH_1: 172096074
NOT_EXPECTED_CIGAR: 2901081
NOT_EXPECTED_READ_LENGTH: 723793349
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 0
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 0
CLIPPED: 0
total: 898790504
outcomes by BAM written to: tmp_output/2024-04-04-10_09_16_617642_read_outcomes_by_bam.txt

novel: 228.15024852752686
The splicing graph and candidate read have been saved into tmp_output/2024-04-04-10_09_16_617642_*.rmats
save: 0.004090309143066406
loadsg: 0.01230311393737793

==========
Done processing each gene from dictionary to compile AS events
Found 17384 exon skipping events
Found 663 exon MX events
Found 7694 alt SS events
There are 4910 alt 3 SS events and 2784 alt 5 SS events.
Found 3729 RI events
==========

ase: 0.7745225429534912
count: 0.23452377319335938
Processing count files.
Done processing count files.

@EricKutschera
Copy link
Contributor

About 80% of the reads were filtered for NOT_EXPECTED_READ_LENGTH and no reads were used (USED: 0)

read outcome totals across all BAMs
USED: 0
NOT_PAIRED: 0
NOT_NH_1: 172096074
NOT_EXPECTED_CIGAR: 2901081
NOT_EXPECTED_READ_LENGTH: 723793349
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 0
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 0
CLIPPED: 0
total: 898790504

From the earlier post it looks like you ran with --readLength 50. If you know the correct read length for your data then you can change the --readLength value. You can use --variable-read-length to disable the read length filter

@danli349
Copy link
Author

danli349 commented Apr 8, 2024

@EricKutschera

I changed to

--readLength 50 \
--variable-read-length \

Then the summary file is:

image
is it correct?

I am wondering are the splices in the summary file group1 versus group2 or all the splices in both groups?
How can I get the relative enriched splices in group1 versus group2?

Thanks a lot

@EricKutschera
Copy link
Contributor

The summary file looks fine

The output is filtered to only events with at least 1 supporting read for each sample group and at least 1 supporting read for each isoform unless --statoff was used: https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/rmats.py#L336

These posts have some discussion about whether a splicing event can be in only one of the groups or enriched in a group:
#232
Xinglab/rmats2sashimiplot#68

@danli349
Copy link
Author

danli349 commented Apr 10, 2024

@EricKutschera
Thanks, could you please let me know how to calculate and scatter plot the PSI of 2 samples?

@EricKutschera
Copy link
Contributor

The PSI values are reported in the output files under the columns: IncLevel1 and IncLevel2

IncLevel1: Inclusion level for sample 1. Replicates are comma separated. Calculated from normalized counts

This post describes how the PSI is calculated: #349 (comment)

@zyh4482
Copy link

zyh4482 commented May 22, 2024

May I ask a question with the same issue here? In such case, does STAR aligner parameter --chimSegmentMin 2 --outFilterMismatchNmax 3 affect the result? I'm using BAM as input.

@EricKutschera
Copy link
Contributor

If fastq files are used as input to rMATS (--s1, --s2) then they will be aligned using --chimSegmentMin 2 and --outFilterMismatchNmax 3: https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/rmats.py#L66

Using BAM files as input (--b1, --b2) doesn't use that code. Based on https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
--chimSegmentMin 2 allows STAR to output chimeric alignments. The default value in STAR is 0 which disables chimeric alignments
--outFilterMismatchNmax 3 filters out alignments with more than 3 mismatches. The default value in STAR is 10

Since those parameters determine which alignments are filtered out they could affect the result

@zyh4482
Copy link

zyh4482 commented May 22, 2024

Thanks for your prompt reply! I found the empty summary is actually caused by setting inappropriate readlength.

Sorry for the above ambiguous questions. What I really want to know is does this parameter affect the result if I use fastq as input? Because usually if you determine to detect chimeric reads, a more common way is setting longer window, like >10. For instance, STAR-fusion recommended using 12 for --chimSegmentMin.

Here, in the default setting regarding to rMATS script, I think maybe chimeric junction will be too many by setting a minimun of 2? I'm not sure if rMATS will also use chimeric junction output from STAR. I'm just curious about the reason why choosing such a small threshold and what the consequence for AS detection under such condition could be. How does rMATS use these chimeric reads?

Thank you.

@EricKutschera
Copy link
Contributor

I think the main reason for --chimSegementMin 2 is to keep as many alignments as possible. rMATS will look at each line in the sam/bam file separately and in particular rMATS doesn't match up the multiple segments of a chimeric alignment. rMATS could use the individual segments of a chimeric alignment to count toward some junctions or exons in an event

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants