Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:SMRNASEQ:CONTAMINANT_FILTER:MAP_RRNA (1)' #295

Closed
musaqa opened this issue Nov 1, 2023 · 10 comments
Labels
bug Something isn't working

Comments

@musaqa
Copy link

musaqa commented Nov 1, 2023

Description of the bug

I am trying to run the smRNAseq pipeline but keep getting error during mapping rRNA fasta file for contaminant filtering

Command used and terminal output

nextflow run nf-core/smrnaseq \
-r 2.2.3 \
--input /home/musaqa/mina/samplesheet.csv \
--protocol custom \
--outdir nf1 \
--multiqc_title nf1_results \
--trim_fastq false \
--clip_r1 0 \
--three_prime_clip_r1 0 \
--mirtrace_species 'dme' \
--fasta /home/musaqa/mina/Drosophila_melanogaster.BDGP6.32.dna.toplevel.fa \
--mirna_gtf /home/musaqa/mina/dme.gff3 \
--mature /home/musaqa/mina/mature.fa \
--hairpin /home/musaqa/mina/hairpin.fa \
--filter_contamination true \
--rrna /home/musaqa/mina/dmerRNA.fa \
--trna /home/musaqa/mina/dmel-all-tRNA-r6.54.fasta \
--max_memory 10.GB \
--max_cpus 8 \
-profile docker

[nf-core/smrnaseq] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:SMRNASEQ:CONTAMINANT_FILTER:MAP_RRNA (1)'

Caused by:
  Process `NFCORE_SMRNASEQ:SMRNASEQ:CONTAMINANT_FILTER:MAP_RRNA (1)` terminated with an error exit status (255)

Command executed:

  INDEX=`find -L ./ -name "*.3.ebwt" | sed 's/.3.ebwt//'`
  bowtie2 \
      --threads 6 \
      --very-sensitive-local \
      -k 1 \
      -x $INDEX \
      --un CD_SPZ_REP2.rRNA.filter.unmapped.contaminant.fastq \
      CD_SPZ_REP2.fastp.fastq.gz \
      [] \
      -S CD_SPZ_REP2.filter.contaminant.sam > CD_SPZ_REP2.contaminant_bowtie.log 2>&1

  # extracting number of reads from bowtie logs
  awk -v type=rRNA 'BEGIN{tot=0} {if(NR==4 || NR == 5){tot += $1}} END {print "\""type"\": "tot }' CD_SPZ_REP2.contaminant_bowtie.log | tr -d , > filtered.CD_SPZ_REP2_rRNA.stats

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SMRNASEQ:SMRNASEQ:CONTAMINANT_FILTER:MAP_RRNA":
      bowtie2: $(echo $(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*$//' | tr -d '')
  END_VERSIONS

Command exit status:
  255

Command output:
  (empty)

Work dir:
  /home/musaqa/mina/work/a1/50c4f2840f6790b2959f6ee34e8942

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Relevant files

nextflow.log

System information

Nextflow version: version 23.10.0
Hardware: Desktop
Executor: local
Container engine: docker
Version of nf-core/smrnaseq: v2.2.3

@musaqa musaqa added the bug Something isn't working label Nov 1, 2023
@musaqa
Copy link
Author

musaqa commented Nov 1, 2023

Ok to add to the original post, after I removed contamination filtering (because when I first removed just the rRNA, the tRNA step also failed in the same way) I got this error:

[nf-core/smrnaseq] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:SMRNASEQ:MIRNA_QUANT:SEQCLUSTER_SEQUENCES (CD_SPZ_REP2_seqcluster)'

Caused by:
Process NFCORE_SMRNASEQ:SMRNASEQ:MIRNA_QUANT:SEQCLUSTER_SEQUENCES (CD_SPZ_REP2_seqcluster) terminated with an error exit status (1)

Command executed:

seqcluster collapse -f CD_SPZ_REP2.fastp.fastq.gz -m 1 --min_size 15 -o collapsed
gzip collapsed/_trimmed.fastq
mkdir final
mv collapsed/
.fastq.gz final/.

cat <<-END_VERSIONS > versions.yml
NFCORE_SMRNASEQ:SMRNASEQ:MIRNA_QUANT:SEQCLUSTER_SEQUENCES":
seqcluster: $(echo $(seqcluster --version 2>&1) | sed 's/^.*seqcluster //')
END_VERSIONS

Command exit status:
1

Command output:
Probably this will fail, you need bcbio-nextgen for many installation functions.
['collapse', '-f', 'CD_SPZ_REP2.fastp.fastq.gz', '-m', '1', '--min_size', '15', '-o', 'collapsed']

Command error:
Unable to find image 'quay.io/biocontainers/seqcluster:1.2.8--pyh5e36f6f_0' locally
1.2.8--pyh5e36f6f_0: Pulling from biocontainers/seqcluster
c1a16a04cedd: Already exists
4ca545ee6d5d: Already exists
b26c965d2ab1: Pulling fs layer
b26c965d2ab1: Verifying Checksum
b26c965d2ab1: Download complete
b26c965d2ab1: Pull complete
Digest: sha256:0583d48b6753377f1aebc04d6a150db308aa5097b414dc79859259a26a8129b6
Status: Downloaded newer image for quay.io/biocontainers/seqcluster:1.2.8--pyh5e36f6f_0
INFO Run collapse
INFO Find UMI tags in read names, collapsing by UMI.
Traceback (most recent call last):
File "/usr/local/bin/seqcluster", line 33, in
sys.exit(load_entry_point('seqcluster==1.2.8', 'console_scripts', 'seqcluster')())
File "/usr/local/lib/python3.9/site-packages/seqcluster/command_line.py", line 49, in main
collapse_fastq(kwargs["args"])
File "/usr/local/lib/python3.9/site-packages/seqcluster/collapse.py", line 18, in collapse_fastq
Probably this will fail, you need bcbio-nextgen for many installation functions.
['collapse', '-f', 'CD_SPZ_REP2.fastp.fastq.gz', '-m', '1', '--min_size', '15', '-o', 'collapsed']
seqs = collapse(umi_fn)
File "/usr/local/lib/python3.9/site-packages/seqcluster/libs/fastq.py", line 22, in collapse
return collapse_umi(in_file)
File "/usr/local/lib/python3.9/site-packages/seqcluster/libs/fastq.py", line 49, in collapse_umi
umis = m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'

Work dir:
/home/musaqa/mina/work/7e/4a3660a7d4ccb2fa9373c61a82231d

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check '.nextflow.log' file for details

The command was the same as before just without:
--filter_contamination true
--rrna /home/musaqa/mina/dmerRNA.fa
--trna /home/musaqa/mina/dmel-all-tRNA-r6.54.fasta \

@lpantano
Copy link
Contributor

lpantano commented Nov 9, 2023

do you think you could share that file(CD_SPZ_REP2.fastp.fastq.gz) with me? is it failing for all the samples or only that one?

@musaqa
Copy link
Author

musaqa commented Nov 9, 2023

Hi, yes I will share the file with you, but upon looking at it, because I ran the command a few times, I have a few files with the same name. Some of them are like ~300Mb and look normal and some are few. There are also same files for other samples. Also I think it gives the same error with different files when you run it a few times. I will delete all the temp files, and run the command only once and share the file that is mentioned in the error.

@musaqa
Copy link
Author

musaqa commented Nov 9, 2023

OK so I reran the command, turns out the other files are just links. The command now fails on different sample:
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/smrnaseq] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:SMRNASEQ:MIRNA_QUANT:SEQCLUSTER_SEQUENCES (CD_SPZ_REP1_seqcluster)'

Caused by:
Process NFCORE_SMRNASEQ:SMRNASEQ:MIRNA_QUANT:SEQCLUSTER_SEQUENCES (CD_SPZ_REP1_seqcluster) terminated with an error exit status (1)

Command executed:

seqcluster collapse -f CD_SPZ_REP1.fastp.fastq.gz -m 1 --min_size 15 -o collapsed
gzip collapsed/_trimmed.fastq
mkdir final
mv collapsed/
.fastq.gz final/.

cat <<-END_VERSIONS > versions.yml
NFCORE_SMRNASEQ:SMRNASEQ:MIRNA_QUANT:SEQCLUSTER_SEQUENCES":
seqcluster: $(echo $(seqcluster --version 2>&1) | sed 's/^.*seqcluster //')
END_VERSIONS

Command exit status:
1

Command output:
Probably this will fail, you need bcbio-nextgen for many installation functions.
['collapse', '-f', 'CD_SPZ_REP1.fastp.fastq.gz', '-m', '1', '--min_size', '15', '-o', 'collapsed']

Command error:
INFO Run collapse
INFO Find UMI tags in read names, collapsing by UMI.
Probably this will fail, you need bcbio-nextgen for many installation functions.
['collapse', '-f', 'CD_SPZ_REP1.fastp.fastq.gz', '-m', '1', '--min_size', '15', '-o', 'collapsed']
Traceback (most recent call last):
File "/usr/local/bin/seqcluster", line 33, in
sys.exit(load_entry_point('seqcluster==1.2.8', 'console_scripts', 'seqcluster')())
File "/usr/local/lib/python3.9/site-packages/seqcluster/command_line.py", line 49, in main
collapse_fastq(kwargs["args"])
File "/usr/local/lib/python3.9/site-packages/seqcluster/collapse.py", line 18, in collapse_fastq
seqs = collapse(umi_fn)
File "/usr/local/lib/python3.9/site-packages/seqcluster/libs/fastq.py", line 22, in collapse
return collapse_umi(in_file)
File "/usr/local/lib/python3.9/site-packages/seqcluster/libs/fastq.py", line 49, in collapse_umi
umis = m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'

Work dir:
/home/musaqa/mina/work/45/bb9a9a1e05955d38f86591cd069a62

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details

This is the link to the file CD_SPZ_REP1.fastp.fastq.gz which is now mentioned in the error log, on google drive:
https://drive.google.com/file/d/1O2nCHNcujR0_IQTl-Masm5bXCKP-Bqyb/view?usp=sharing

@musaqa musaqa closed this as completed Nov 16, 2023
@musaqa musaqa reopened this Nov 16, 2023
@musaqa
Copy link
Author

musaqa commented Nov 16, 2023

Can the problem be the fact that the data has Phred64 quality scores?

@Daniel-Moreira-bio
Copy link

The original error regarding CONTAMINANT_FILTER is due the script is looking for a bowtie1 index and the module smrnaseq/modules/local/bowtie_contaminants.nf have built bowtie2 indexes.

I have changed the line 25 of smrnaseq/modules/local/bowtie_map_contaminants.nf
from:
INDEX=find -L ./ -name "*.3.ebwt" | sed 's/.3.ebwt//'
to:
INDEX=find -L ./ -name "*.3.bt2" | sed 's/.3.bt2//'

And also deleted the line 33 ( ${args} \), because it is an empty variable that is causing conflict with the output writing.

Hope that helps.

@musaqa
Copy link
Author

musaqa commented Dec 11, 2023

Thank you for looking into my problem, however after editing the bowtie_map_contaminants.nf file as you suggested, i get a new error:
Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:SMRNASEQ:CONTAMINANT_FILTER:MAP_RRNA (1)'

Caused by:
Process NFCORE_SMRNASEQ:SMRNASEQ:CONTAMINANT_FILTER:MAP_RRNA (1) terminated with an error exit status (127)

Command executed:

INDEX=find -L ./ -name "*.3.bt2" | sed 's/.3.bt2//'
bowtie2
--threads 6
--very-sensitive-local
-k 1
-x $INDEX
--un CD_SPZ_REP2.rRNA.filter.unmapped.contaminant.fastq
CD_SPZ_REP2.fastp.fastq.gz
-S CD_SPZ_REP2.filter.contaminant.sam > CD_SPZ_REP2.contaminant_bowtie.log 2>&1

extracting number of reads from bowtie logs

awk -v type=rRNA 'BEGIN{tot=0} {if(NR==4 || NR == 5){tot += $1}} END {print """type"": "tot }' CD_SPZ_REP2.contaminant_bowtie.log | tr -d , > filtered.CD_SPZ_REP2_rRNA.stats

cat <<-END_VERSIONS > versions.yml
"NFCORE_SMRNASEQ:SMRNASEQ:CONTAMINANT_FILTER:MAP_RRNA":
bowtie2: $(echo $(bowtie2 --version 2>&1) | sed 's/^.bowtie2-align-s version //; s/ .$//' | tr -d '')
END_VERSIONS

Command exit status:
127

Command output:
(empty)

Command error:
.command.sh: line 2: -L: command not found

Work dir:
/home/musaqa/mina/work/a7/8d36f8e4860dd37c3205418edfca4a

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details

It seams that the program now thinks -L is a command and not a parameter of find function?

@musaqa
Copy link
Author

musaqa commented Dec 11, 2023

upon inspection i saw that there were quotation marks missing so the line 25 should be:
INDEX=find -L ./ -name "3.bt2" | sed 's/.3.bt2//'
after correcting this i get this error:

-[nf-core/smrnaseq] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:SMRNASEQ:CONTAMINANT_FILTER:MAP_RRNA (1)'

Caused by:
Process NFCORE_SMRNASEQ:SMRNASEQ:CONTAMINANT_FILTER:MAP_RRNA (1) terminated with an error exit status (255)

Command executed:

INDEX=find -L ./ -name "3.bt2" | sed 's/.3.bt2//'
bowtie2
--threads 6
--very-sensitive-local
-k 1
-x $INDEX
--un CD_SPZ_REP2.rRNA.filter.unmapped.contaminant.fastq
CD_SPZ_REP2.fastp.fastq.gz
-S CD_SPZ_REP2.filter.contaminant.sam > CD_SPZ_REP2.contaminant_bowtie.log 2>&1

extracting number of reads from bowtie logs

awk -v type=rRNA 'BEGIN{tot=0} {if(NR==4 || NR == 5){tot += $1}} END {print """type"": "tot }' CD_SPZ_REP2.contaminant_bowtie.log | tr -d , > filtered.CD_SPZ_REP2_rRNA.stats

cat <<-END_VERSIONS > versions.yml
"NFCORE_SMRNASEQ:SMRNASEQ:CONTAMINANT_FILTER:MAP_RRNA":
bowtie2: $(echo $(bowtie2 --version 2>&1) | sed 's/^.bowtie2-align-s version //; s/ .$//' | tr -d '')
END_VERSIONS

Command exit status:
255

Command output:
(empty)

Work dir:
/home/musaqa/mina/work/1a/3e691364a86d5d44406614ed0ee60c

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details

@Daniel-Moreira-bio
Copy link

There is also a backtick missing, try this:
INDEX=find -L ./ -name "*.3.bt2" | sed "s/\\.3.bt2\$//"

In fact, there are some other modifications that can be made. I follow this pull request (#294) and fixed the contamination_filter step.

@apeltzer
Copy link
Member

See #303 for hopefully permanent fixes now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants