Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow outputs (second preview) #30

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
7 changes: 3 additions & 4 deletions bin/fastqc.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#!/usr/bin/env bash
sample_id="$1"
reads="$2"
reads="$1"

mkdir fastqc_${sample_id}_logs
fastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads}
mkdir fastqc
fastqc -o fastqc -f fastq -q ${reads}
55 changes: 42 additions & 13 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -4,36 +4,65 @@
* Proof of concept of a RNAseq pipeline implemented with Nextflow
*/

nextflow.preview.output = true

/*
* Default pipeline parameters. They can be overriden on the command line eg.
* given `params.foo` specify on the run command line `--foo some_value`.
*/

params.reads = "$baseDir/data/ggal/ggal_gut_{1,2}.fq"
params.transcriptome = "$baseDir/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa"
params.outdir = "results"
params.multiqc = "$baseDir/multiqc"


// import modules
/*
* import modules
*/
include { RNASEQ } from './modules/rnaseq'
include { MULTIQC } from './modules/multiqc'

/*
* main script flow
*/
workflow {
main:
log.info """\
R N A S E Q - N F P I P E L I N E
===================================
transcriptome: ${params.transcriptome}
reads : ${params.reads}
outdir : ${params.outdir}
""".stripIndent()

log.info """\
R N A S E Q - N F P I P E L I N E
===================================
transcriptome: ${params.transcriptome}
reads : ${params.reads}
outdir : ${params.outdir}
"""

read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true )
read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true, flat: true )
RNASEQ( params.transcriptome, read_pairs_ch )
MULTIQC( RNASEQ.out, params.multiqc )

samples_ch = RNASEQ.out.quant
| join(RNASEQ.out.fastqc)
Comment on lines +40 to +41
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is the important bit -- we are joining the metadata, fastqc logs, and quant results for each sample into a single channel, and publishing that channel

then the path target directive is used to control the output directory structure


multiqc_ch = RNASEQ.out.quant
| concat(RNASEQ.out.fastqc)
| map { _id, file -> file }
| collect
MULTIQC( multiqc_ch, params.multiqc )

publish:
samples_ch >> 'samples'
MULTIQC.out >> 'summary'
}

output {
samples {
path { id, _quant, _fastqc -> "${workflow.outputDir}/${id}" }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bug in the workflow output DSL, it is not resolving the dynamic name against the base output directory

index {
path 'index.json'
mapper { id, quant, fastqc ->
[id: id, quant: quant, fastqc: fastqc]
}
}
}

summary {
path '.'
}
}
10 changes: 4 additions & 6 deletions modules/fastqc/main.nf
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
params.outdir = 'results'

process FASTQC {
tag "FASTQC on $sample_id"
tag "$sample_id"
conda 'bioconda::fastqc=0.12.1'
publishDir params.outdir, mode:'copy'

input:
tuple val(sample_id), path(reads)
tuple val(sample_id), path(fastq_1), path(fastq_2)

output:
path "fastqc_${sample_id}_logs", emit: logs
tuple val(sample_id), path('fastqc')

script:
"""
fastqc.sh "$sample_id" "$reads"
fastqc.sh "$fastq_1 $fastq_2"
"""
}
2 changes: 0 additions & 2 deletions modules/multiqc/main.nf
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
params.outdir = 'results'

process MULTIQC {
conda 'bioconda::multiqc=1.25'
publishDir params.outdir, mode:'copy'

input:
path '*'
Expand Down
10 changes: 5 additions & 5 deletions modules/quant/main.nf
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@

process QUANT {
tag "$pair_id"
tag "$sample_id"
conda 'bioconda::salmon=1.10.3'

input:
path index
tuple val(pair_id), path(reads)
path index
tuple val(sample_id), path(fastq_1), path(fastq_2)

output:
path pair_id
tuple val(sample_id), path('quant')

script:
"""
salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id
salmon quant --threads $task.cpus --libType=U -i $index -1 ${fastq_1} -2 ${fastq_2} -o quant
"""
}
11 changes: 6 additions & 5 deletions modules/rnaseq.nf
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,13 @@ workflow RNASEQ {
take:
transcriptome
read_pairs_ch
main:

main:
INDEX(transcriptome)
FASTQC(read_pairs_ch)
QUANT(INDEX.out, read_pairs_ch)

emit:
QUANT.out | concat(FASTQC.out) | collect
}
emit:
quant = QUANT.out
fastqc = FASTQC.out
}
Loading