Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Addition: Salmon] #221

Merged
merged 166 commits into from
Jul 8, 2019
Merged
Show file tree
Hide file tree
Changes from 93 commits
Commits
Show all changes
166 commits
Select commit Hold shift + click to select a range
8fb4041
Addhtseq to environment requirements
olgabot May 29, 2019
71bcd9f
Add HTSeq to changelog
olgabot May 29, 2019
2727392
small change to trigger build
olgabot May 29, 2019
74c004c
Add v1.4dev section and move change there
olgabot May 29, 2019
8369cc4
Merge branch 'dev' into patch-1
apeltzer May 29, 2019
9d44294
Merge pull request #209 from olgabot/patch-1
apeltzer May 29, 2019
eccbec3
Add salmon to requirements
olgabot Jun 3, 2019
eeb85c6
Update changelog with salmon=0.14.0
olgabot Jun 3, 2019
2a3e6b4
Add salmon index
olgabot Jun 3, 2019
5dd9a14
Add transcriptome to test config
olgabot Jun 3, 2019
10d0d6d
fa --> fasta
olgabot Jun 3, 2019
6252606
Less character width to specify transcriptome in summary
olgabot Jun 3, 2019
c1a4dfb
Add salmon quant
olgabot Jun 3, 2019
6aa3689
add raw_salmon channel
olgabot Jun 3, 2019
2d6c85b
Add salmon index to changelog
olgabot Jun 3, 2019
1eeea23
Add salmon to usage
olgabot Jun 3, 2019
362304a
Output salmon index to reference_transcriptome folder
olgabot Jun 3, 2019
110fc45
write out more docs for salmon index
olgabot Jun 3, 2019
23e5821
Add cpus for salmon quant
olgabot Jun 3, 2019
da96b5a
Add gtf for gene summarization
olgabot Jun 3, 2019
ad0791a
use BOTH salmon_quant and salmon_index configs
olgabot Jun 4, 2019
f344819
use same cpu, memory for both salmon quant and index
olgabot Jun 3, 2019
5b5424d
semicolon, not comma -_-;;;
olgabot Jun 3, 2019
5ab4b4d
Add merging of salmon gene and transcript counts
olgabot Jun 3, 2019
be6eafc
Don't need salmon quant twice
olgabot Jun 4, 2019
eff09b2
Add merging of transcript id to gene name mapping
olgabot Jun 4, 2019
24edf8c
Merge pull request #210 from czbiohub/olgabot/add-salmon
lpantano Jun 5, 2019
faf2b32
Move salmon to after featurecounts to use featurecounts output
olgabot Jun 5, 2019
49b694a
Add scraping software versions
olgabot Jun 5, 2019
4267ba9
Separate transcript and gene count merging
olgabot Jun 5, 2019
68dac0c
Output sample name with trimmed reads
olgabot Jun 5, 2019
29c44cf
try some different logic to get salmon quant to run enough times
olgabot Jun 5, 2019
d08fb65
fix more salmon merging commands
olgabot Jun 5, 2019
639d95c
try to get salmon quant to work on each file individually
olgabot Jun 5, 2019
c209e29
Add changes from @kerimoff
olgabot Jun 5, 2019
493c1ac
force matplotlib to be 3.0.3
lpantano Jun 5, 2019
e765bae
Update CHANGELOG.md
lpantano Jun 5, 2019
f49819e
Get salmon merging to work
olgabot Jun 5, 2019
392135a
salmon_index --> makeSalmonIndex
olgabot Jun 5, 2019
81fbef5
Add escaped backslashes
olgabot Jun 5, 2019
7c05cad
add r-base 3.5.1 to speed up docker build
lpantano Jun 5, 2019
01a0b2a
Update CHANGELOG.md
lpantano Jun 5, 2019
7992d2f
Merge pull request #214 from nf-core/lpantano-multiqc-patch-1
lpantano Jun 5, 2019
8930e29
Add salmon to requirements
olgabot Jun 3, 2019
130a993
Update changelog with salmon=0.14.0
olgabot Jun 3, 2019
5545cc0
force matplotlib to be 3.0.3
lpantano Jun 5, 2019
0f0e75d
Update CHANGELOG.md
lpantano Jun 5, 2019
bff7e2f
add r-base 3.5.1 to speed up docker build
lpantano Jun 5, 2019
f4dfe03
Update CHANGELOG.md
lpantano Jun 5, 2019
8b21877
Make label for salmon to simplify cpu/memory config
olgabot Jun 6, 2019
2c28093
Add salmon to multiqc
olgabot Jun 6, 2019
b9fbdbf
Merge pull request #212 from czbiohub/olgabot/salmon-quant
apeltzer Jun 7, 2019
c11df52
Adding in Lorena and Olga
apeltzer Jun 7, 2019
9dc8bea
Rephrased Changelog
apeltzer Jun 7, 2019
cc4a0ff
Adjust the summary
apeltzer Jun 7, 2019
546bc7f
Adding Salmon to MultiQC logs
apeltzer Jun 7, 2019
87185ba
Should have the salmon module now
apeltzer Jun 7, 2019
0414c5e
Fix summary transcriptome statement
apeltzer Jun 8, 2019
819e21f
Use condition instead of duplicating code :)
apeltzer Jun 8, 2019
bcaa19d
Fix parentheses
apeltzer Jun 8, 2019
be0fba8
MultiQC logging working now
apeltzer Jun 8, 2019
912abd7
Merge pull request #218 from apeltzer/salmon
apeltzer Jun 8, 2019
edd5b9d
Merge branch 'dev' into salmon
apeltzer Jun 8, 2019
dc98760
Update docs/output.md
apeltzer Jun 9, 2019
df3aaa2
Update main.nf
apeltzer Jun 9, 2019
0aef01e
Update main.nf
apeltzer Jun 9, 2019
bc9ce62
Update main.nf
apeltzer Jun 9, 2019
d84f72e
Update main.nf
apeltzer Jun 9, 2019
607fa76
Update main.nf
apeltzer Jun 9, 2019
54dbe03
Update main.nf
apeltzer Jun 9, 2019
6a2356d
Update main.nf
apeltzer Jun 9, 2019
cdca78f
Mini change to add aligner to help
apeltzer Jun 9, 2019
6642c15
Adding transcriptome option to parameters.settings.json
apeltzer Jun 9, 2019
45e5874
Merge pull request #222 from apeltzer/salmon
apeltzer Jun 9, 2019
3cf8fbb
Merge branch 'dev' into salmon
apeltzer Jun 9, 2019
162ec29
First pass update of relevant files
drpatelh Jun 10, 2019
d795658
Fix lint errors and warnings
drpatelh Jun 10, 2019
4523b87
Reorder parameters
drpatelh Jun 10, 2019
66a586c
Rename Salmon processes
drpatelh Jun 10, 2019
d563371
Use correct strandedness
drpatelh Jun 10, 2019
c891083
Reorder validate inputs
drpatelh Jun 10, 2019
c234717
Merge pull request #223 from drpatelh/salmon
drpatelh Jun 10, 2019
a7536c1
Bug fixes
drpatelh Jun 10, 2019
3f664e2
Merge pull request #224 from drpatelh/salmon
drpatelh Jun 10, 2019
fbad8b6
Add empty salmon_multiqc_logs channel
olgabot Jun 11, 2019
92d35ea
Fix merge conflicts
drpatelh Jun 12, 2019
05e4eaf
Major overhaul of Salmon requirements
drpatelh Jun 13, 2019
a77dd45
Update CHANGELOG
drpatelh Jun 13, 2019
039f771
Merge pull request #232 from drpatelh/salmon
drpatelh Jun 13, 2019
6ff4577
Add tximport function
lpantano Jun 13, 2019
ec4915a
add docs for tximport
lpantano Jun 13, 2019
fd6ab71
update readme and changelog
lpantano Jun 13, 2019
926a950
Merge pull request #233 from nf-core/lpantano-tximport
drpatelh Jun 13, 2019
c65197d
fix variable and correct version to parse gtf
lpantano Jun 15, 2019
bfac38c
Merge pull request #234 from nf-core/lpantano-salmon-patch
drpatelh Jun 16, 2019
4a0a3d7
Close outstanding issues and amend salmon merge
drpatelh Jun 18, 2019
38316eb
Update CHANGELOG
drpatelh Jun 18, 2019
5918ddd
Merge pull request #236 from drpatelh/salmon
drpatelh Jun 18, 2019
7f74c7a
Remove subsamp_filesize_thresh parameter
drpatelh Jun 18, 2019
a18d29f
Merge pull request #237 from drpatelh/salmon
drpatelh Jun 18, 2019
827a895
Don't extract transcripts if transcript_fasta is provided
olgabot Jun 21, 2019
3aa02d4
Read transcript_fasta from config genome
olgabot Jun 21, 2019
7d44b79
fix typo in 'pseudo_aligner'
olgabot Jun 21, 2019
fbc2f2f
fix typo in object name
lpantano Jun 21, 2019
6a349b2
Fix logic for both star index provided + salmon for fasta
olgabot Jun 21, 2019
bceadda
--transcriptome --> --transcript_fasta
olgabot Jun 21, 2019
caaaacb
Merge pull request #239 from pilm-bioinformatics/lpantano-salmon-txim…
apeltzer Jun 22, 2019
a1bbbe8
Merged my dev with upstream dev
apeltzer Jun 23, 2019
7effd36
Add in ReadGroups for QualiMap compatibility
apeltzer Jun 23, 2019
d55179e
Fix typo
apeltzer Jun 23, 2019
a741755
Fix seqCenter
apeltzer Jun 23, 2019
b0b7be4
HISAT2 seq_center
apeltzer Jun 23, 2019
674285e
Revert "Read transcript_fasta from config genome"
olgabot Jun 25, 2019
4b5a495
Change logic to deal with both salmon + alignment and fasta references
olgabot Jun 25, 2019
8fb4a9d
Add --gencode flag to salmon index'
olgabot Jun 25, 2019
df5d1c8
--transcriptome --> --transcript_fasta
olgabot Jun 25, 2019
44b7686
Only transfer quant.sf files for salon_merge"
olgabot Jun 25, 2019
a1de008
Add separate step to clean featurecounts output to minimize memory ne…
olgabot Jun 25, 2019
e60c8c2
Make salmon_merge also mid_memory
olgabot Jun 25, 2019
d4e2933
missing 'into'
olgabot Jun 25, 2019
aaab9ad
Get clean_featureCounts to work with test data
olgabot Jun 25, 2019
66d9274
Use all quant files into salmon_merge -- Reverts 44b7686656815f8221e9…
olgabot Jun 25, 2019
ae68f31
remove git cruft
olgabot Jun 25, 2019
21bf9d8
fix mismatch between tx2gene and quant.sf
lpantano Jun 25, 2019
02d3ce9
Use paste to merge everything
olgabot Jun 25, 2019
f66abb0
Use params.gencode to decide on --gencode flag
olgabot Jun 26, 2019
a70c42a
use evaluated $gencode parameter
olgabot Jun 26, 2019
3ab840f
add default value for gencode
olgabot Jun 26, 2019
a322ebd
Set params.fc_group_features_type = 'gene_type' if gencode
olgabot Jun 26, 2019
1357c9c
Add note about --gencode for usage"
olgabot Jun 26, 2019
29369b8
Add note about --gencode for changelog
olgabot Jun 26, 2019
5b2de6f
no "markdups" in filename
olgabot Jun 27, 2019
51e84b0
Update docs/usage.md
olgabot Jun 27, 2019
a61e16f
Use @drpatelh's description
olgabot Jun 27, 2019
6f269a5
Apply suggestions from code review
olgabot Jun 27, 2019
b544095
Remove reference to PR for changelog
olgabot Jun 27, 2019
bbd8ba1
evaluate params.fc_group_features_type within featureCounts process
olgabot Jun 27, 2019
7bc52ec
Use unix-fu to merge featurecounts
olgabot Jun 27, 2019
f2a135e
Wrap biotype variable in braces
olgabot Jun 27, 2019
3ff8ed7
use 'bash' for syntax of fasta/gtf to fix markdownlint
olgabot Jun 27, 2019
8c4ad54
Remove first line of featurecounts files
olgabot Jun 27, 2019
6d34512
Evaluate gene biotype earlier and print in summary
olgabot Jun 27, 2019
d8242ec
Remove csvtk from requirements
olgabot Jun 27, 2019
a3c30ad
add a note about redirection
olgabot Jun 27, 2019
ac2b2a2
Use @drpatelh's wording
olgabot Jun 27, 2019
11276af
remove random fenced code
olgabot Jun 27, 2019
beab23f
Use 7th column for gene namec
olgabot Jun 27, 2019
32a37fd
Shorten name of biotype field in summary for brevity
olgabot Jun 27, 2019
7d84ba2
use 0'th item not 1th
olgabot Jun 27, 2019
17dd8fe
Add sample name to output
olgabot Jun 27, 2019
0028728
use tximport for each sample and then merge individually
olgabot Jun 27, 2019
f1f3b8a
properly merge gene counts and tpm files
olgabot Jun 27, 2019
2be0b27
actually use the gene counts to merge ..
olgabot Jun 28, 2019
70bb37c
Add transcript_id and gene_id to salmon output csv
olgabot Jun 28, 2019
3bef844
Add --gencode flag to usage and summary output
olgabot Jun 28, 2019
41c58f8
Don't need salmon RDS files
olgabot Jun 28, 2019
5c13cc0
Add gtf_qualimap
olgabot Jul 1, 2019
3c72549
Merge pull request #242 from czbiohub/olgabot/salmon-gencode
apeltzer Jul 2, 2019
034ee2a
Merge pull request #243 from czbiohub/olgabot/featurecounts-merge-memory
apeltzer Jul 7, 2019
848241a
Merge pull request #244 from pilm-bioinformatics/lpantano-salmon-txim…
apeltzer Jul 7, 2019
d1b6d34
Remove trailing slash
apeltzer Jul 7, 2019
68767e6
Merge branch 'dev' into salmon
apeltzer Jul 7, 2019
ad7e743
Merge pull request #241 from apeltzer/salmon-readgroups
apeltzer Jul 8, 2019
7e073bd
Merged
apeltzer Jul 8, 2019
2bc9a51
Merge branch 'dev' into salmon
apeltzer Jul 8, 2019
2ab698d
Merged changes
apeltzer Jul 8, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@ script:
- nf-core lint ${TRAVIS_BUILD_DIR}
# Lint the documentation
- markdownlint ${TRAVIS_BUILD_DIR} -c ${TRAVIS_BUILD_DIR}/.github/markdownlint.yml
# Run, build reference genome with STAR
# Run with STAR
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker
# Run, build reference genome with HISAT2
# Run with HISAT2
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --aligner hisat2
# Run with STAR and Salmon
- nextflow run ${TRAVIS_BUILD_DIR} -profile test,docker --pseudo_aligner salmon
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,13 @@

### Pipeline updates

* Added tximport to merge salmon output
* Added Salmon as an supplementary method to STAR and HiSAT2
* Added `--psuedo_aligner`, `--transcript_fasta` and `--salmon_index` parameters
* Add `Citation` and `Quick Start` section to `README.md`
* Integrate changes in `nf-core/tools v1.6` template
* Add tximport and summarizedexperiment dependency [#171](https://github.com/nf-core/rnaseq/issues/171)
* Change all boolean parameters from snake_case to camelCase and vice versa for value parameters
* Appointed changes because of missing output of the multiqc_plots folder [#200](https://github.com/nf-core/rnaseq/issues/200)
* Add Qualimap dependency [#202](https://github.com/nf-core/rnaseq/issues/202)
* Obtain edgeR + dupRadar version information [#198](https://github.com/nf-core/rnaseq/issues/198) and [#112](https://github.com/nf-core/rnaseq/issues/112)
Expand All @@ -22,6 +28,7 @@
* qualimap 2.2.2b -> 2.2.2c
* trim-galore 0.6.1 -> 0.6.2
* gffread 0.9.12 -> 0.11.4
* Force matplotlib=3.0.3
drpatelh marked this conversation as resolved.
Show resolved Hide resolved
* Added Salmon 0.14.0
* Added RSEM 1.3.2
* Added tximport 1.0.3
Expand Down Expand Up @@ -60,6 +67,8 @@
* deeptools 3.2.0 -> 3.2.1
* trim-galore 0.5.0 -> 0.6.1
* qualimap 2.2.2b
* matplotlib 3.0.3
* r-base 3.5.1

## [Version 1.2](https://github.com/nf-core/rnaseq/releases/tag/1.2) - 2018-12-12

Expand Down
36 changes: 33 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,34 @@
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/)
[![Docker](https://img.shields.io/docker/automated/nfcore/rnaseq.svg)](https://hub.docker.com/r/nfcore/rnaseq/)


### Introduction

**nf-core/rnaseq** is a bioinformatics analysis pipeline used for RNA sequencing data.

The workflow processes raw data from FastQ inputs ([FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), [Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)), aligns the reads ([STAR](https://github.com/alexdobin/STAR) or [HiSAT2](https://ccb.jhu.edu/software/hisat2/index.shtml)), generates gene counts ([featureCounts](http://bioinf.wehi.edu.au/featureCounts/), [StringTie](https://ccb.jhu.edu/software/stringtie/)) and performs extensive quality-control on the results ([RSeQC](http://rseqc.sourceforge.net/), [Qualimap](http://qualimap.bioinfo.cipf.es/), [dupRadar](https://bioconductor.org/packages/release/bioc/html/dupRadar.html), [Preseq](http://smithlabresearch.org/software/preseq/), [edgeR](https://bioconductor.org/packages/release/bioc/html/edgeR.html), [MultiQC](http://multiqc.info/)). See the [output documentation](docs/output.md) for more details of the results.
The workflow processes raw data from FastQ inputs ([FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), [Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)), aligns the reads ([STAR](https://github.com/alexdobin/STAR) or [HiSAT2](https://ccb.jhu.edu/software/hisat2/index.shtml)), generates counts relative to genes ([featureCounts](http://bioinf.wehi.edu.au/featureCounts/), [StringTie](https://ccb.jhu.edu/software/stringtie/)) or transcripts ([Salmon](https://combine-lab.github.io/salmon/), [tximport](https://bioconductor.org/packages/release/bioc/html/tximport.html)) and performs extensive quality-control on the results ([RSeQC](http://rseqc.sourceforge.net/), [Qualimap](http://qualimap.bioinfo.cipf.es/), [dupRadar](https://bioconductor.org/packages/release/bioc/html/dupRadar.html), [Preseq](http://smithlabresearch.org/software/preseq/), [edgeR](https://bioconductor.org/packages/release/bioc/html/edgeR.html), [MultiQC](http://multiqc.info/)). See the [output documentation](docs/output.md) for more details of the results.

The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

## Quick Start

i. Install [`nextflow`](https://nf-co.re/usage/installation)

ii. Install one of [`docker`](https://docs.docker.com/engine/installation/), [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html)

iii. Download the pipeline and test it on a minimal dataset with a single command

```bash
nextflow run nf-core/rnaseq -profile test,<docker/singularity/conda>
```

iv. Start running your own analysis!

```bash
nextflow run nf-core/rnaseq -profile <docker/singularity/conda> --reads '*_R{1,2}.fastq.gz' --genome GRCh37
```

See [usage docs](docs/usage.md) for all of the available options when running the pipeline.

### Documentation
The nf-core/rnaseq pipeline comes with documentation about the pipeline, found in the `docs/` directory:

Expand All @@ -37,4 +56,15 @@ Many thanks to other who have helped out along the way too, including (but not l
[@orzechoj](https://github.com/orzechoj),
[@apeltzer](https://github.com/apeltzer),
[@colindaven](https://github.com/colindaven),
[@jburos](https://github.com/jburos).
[@lpantano](https://github.com/lpantano),
[@olgabot](https://github.com/olgabot),
[@jburos](https://github.com/jburos),
[@drpatelh](https://github.com/drpatelh).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, more than doubled the existing contributor base!


## Citation

<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi. -->
If you use nf-core/rnaseq for your analysis, please cite it using the following doi: [10.5281/zenodo.1400710](https://doi.org/10.5281/zenodo.1400710)

You can cite the `nf-core` pre-print as follows:
Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. **nf-core: Community curated bioinformatics pipelines**. *bioRxiv*. 2019. p. 610741. [doi: 10.1101/610741](https://www.biorxiv.org/content/10.1101/610741v1).
1 change: 1 addition & 0 deletions assets/multiqc_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ extra_fn_clean_exts:
- _R2
- .hisat
- '.sorted.markDups'
- '.sorted'

report_comment: >
This report has been generated by the <a href="https://github.com/nf-core/rnaseq" target="_blank">nf-core/rnaseq</a>
Expand Down
69 changes: 69 additions & 0 deletions bin/parse_gtf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/usr/bin/env python
from __future__ import print_function
from collections import OrderedDict, defaultdict, Counter
import logging
import argparse
import glob
import os

# Create a logger
logging.basicConfig(format='%(name)s - %(asctime)s %(levelname)s: %(message)s')
logger = logging.getLogger(__file__)
logger.setLevel(logging.INFO)


def read_top_transcript(salmon):
txs = set()
fn = glob.glob(os.path.join(salmon, "*", "quant.sf"))[1]
with open(fn) as inh:
for line in inh:
if line.startswith("Name"):
continue
txs.add(line.split()[0])
if len(txs) > 100:
break
logger.info("Transcripts found in FASTA: %s" % txs)
return txs


def tx2gene(gtf, salmon, gene_id, extra, out):
txs = read_top_transcript(salmon)
votes = Counter()
gene_dict = defaultdict(dict)
with open(gtf) as inh:
for line in inh:
if line.startswith("#"):
continue
cols = line.split("\t")
attr_dict = OrderedDict()
for gff_item in cols[8].split(";"):
item_pair = gff_item.strip().split(" ")
if len(item_pair) > 1:
value = item_pair[1].strip().replace("\"", "")
if value in txs:
votes[item_pair[0].strip()] += 1

attr_dict[item_pair[0].strip()] = value
gene_dict[attr_dict[gene_id]] = attr_dict

if not votes:
logger.warning("No attribute in GTF matching transcripts")
return None

txid = votes.most_common(1)[0][0]
logger.info("Attributed found to be transcript: %s" % txid)
with open(out, 'w') as outh:
for gene in gene_dict:
print("%s,%s,%s" % (gene_dict[gene][txid], gene, gene_dict[gene][extra]), file=outh)


if __name__ == "__main__":
parser = argparse.ArgumentParser(description="""Get tx to gene names for tximport""")
parser.add_argument("--gtf", type=str, help="GTF file")
parser.add_argument("--salmon", type=str, help="output of salmon")
parser.add_argument("--id", type=str, help="gene id in the gtf file")
parser.add_argument("--extra", type=str, help="extra id in the gtf file")
parser.add_argument("-o", "--output", dest='output', default='tx2gene.csv', type=str, help="file with output")

args = parser.parse_args()
tx2gene(args.gtf, args.salmon, args.id, args.extra, args.output)
2 changes: 2 additions & 0 deletions bin/scrape_software_versions.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
'Picard MarkDuplicates': ['v_markduplicates.txt', r"([\d\.]+)-SNAPSHOT"],
'Samtools': ['v_samtools.txt', r"samtools (\S+)"],
'featureCounts': ['v_featurecounts.txt', r"featureCounts v(\S+)"],
'Salmon': ['v_salmon.txt', r"salmon (\S+)"],
'deepTools': ['v_deeptools.txt', r"bamCoverage (\S+)"],
'StringTie': ['v_stringtie.txt', r"(\S+)"],
'Preseq': ['v_preseq.txt', r"Version: (\S+)"],
Expand All @@ -34,6 +35,7 @@
results['Picard MarkDuplicates'] = '<span style="color:#999999;\">N/A</span>'
results['Samtools'] = '<span style="color:#999999;\">N/A</span>'
results['featureCounts'] = '<span style="color:#999999;\">N/A</span>'
results['Salmon'] = '<span style="color:#999999;\">N/A</span>'
results['StringTie'] = '<span style="color:#999999;\">N/A</span>'
results['Preseq'] = '<span style="color:#999999;\">N/A</span>'
results['deepTools'] = '<span style="color:#999999;\">N/A</span>'
Expand Down
73 changes: 73 additions & 0 deletions bin/tximport.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#!/usr/bin/env Rscript

args = commandArgs(trailingOnly=TRUE)
if (length(args) < 2) {
stop("Usage: tximeta.r <coldata> <salmon_out>", call.=FALSE)
}

path = args[2]
coldata = args[1]

tx2gene = "tx2gene.csv"
info = file.info(tx2gene)
if (info$size == 0){
tx2gene = NULL
}else{
rowdata = read.csv(tx2gene, header = FALSE)
colnames(rowdata) = c("tx", "gene_id", "gene_name")
tx2gene = rowdata[,1:2]
}

fns = list.files(path, pattern = "quant.sf", recursive = T, full.names = T)
names = basename(dirname(fns))
names(fns) = names
coldata = list.files(coldata, full.names = TRUE)
if (length(coldata)==0){
coldata = "NULL"
}
if (file.exists(coldata)){
coldata = read.csv(coldata)
coldata = coldata[match(names, coldata[,1]),]
coldata = cbind(files = fns, coldata)
}else{
message("ColData not avaliable ", coldata)
coldata = data.frame(files = fns, names = names)
}

library(SummarizedExperiment)

# if not genome version is giving
library(tximport)

txi = tximport(fns, type = "salmon", txOut = TRUE)
rownames(coldata) = coldata[["names"]]
rowdata = rowdata[match(rownames(txi[[1]]), rowdata[["tx"]]),]
se = SummarizedExperiment(assays = list(counts = txi[["counts"]],
abundance = txi[["abundance"]],
length = txi[["length"]]),
colData = DataFrame(coldata),
rowData = rowdata)
if (!is.null(tx2gene)){
gi = summarizeToGene(txi, tx2gene = tx2gene)
growdata = unique(rowdata[,2:3])
growdata = growdata[match(rownames(gi[[1]]), growdata[["gene_id"]]),]
gse = SummarizedExperiment(assays = list(counts = gi[["counts"]],
abundance = gi[["abundance"]],
length = gi[["length"]]),
colData = DataFrame(coldata),
rowData = growdata)
}

if(exists("gse")){
saveRDS(gse, file = "gse.rds")
write.csv(assays(se)[["abundance"]], "merged_salmon_gene_tpm.csv")
write.csv(assays(se)[["counts"]], "merged_salmon_gene_reads.csv")
}

saveRDS(se, file = "se.rds")
write.csv(assays(se)[["abundance"]], "merged_salmon_tx_tpm.csv")
write.csv(assays(se)[["counts"]], "merged_salmon_tx_reads.csv")

# Print sessioninfo to standard out
citation("tximeta")
sessionInfo()
35 changes: 20 additions & 15 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -20,20 +20,6 @@ process {
maxErrors = '-1'

// Process-specific resource requirements
withName: trim_galore {
time = { check_max( 8.h * task.attempt, 'time' ) }
}
withName:markDuplicates {
// Actually the -Xmx value should be kept lower,
// and is set through the markdup_java_options
cpus = { check_max( 8, 'cpus' ) }
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
}
withName: makeHISATindex {
cpus = { check_max( 10, 'cpus' ) }
memory = { check_max( 200.GB * task.attempt, 'memory' ) }
time = { check_max( 5.h * task.attempt, 'time' ) }
}
withLabel: low_memory {
memory = { check_max( 16.GB * task.attempt, 'memory' ) }
}
Expand All @@ -46,7 +32,26 @@ process {
memory = { check_max( 80.GB * task.attempt, 'memory' ) }
time = { check_max( 8.h * task.attempt, 'time' ) }
}
withName: "multiqc|get_software_versions" {

withName: makeHISATindex {
cpus = { check_max( 10, 'cpus' ) }
memory = { check_max( 200.GB * task.attempt, 'memory' ) }
time = { check_max( 5.h * task.attempt, 'time' ) }
}
withName: trim_galore {
time = { check_max( 8.h * task.attempt, 'time' ) }
}
withName: markDuplicates {
// Actually the -Xmx value should be kept lower,
// and is set through the markdup_java_options
cpus = { check_max( 8, 'cpus' ) }
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
}
withLabel: salmon {
cpus = { check_max( 8, 'cpus' ) }
memory = { check_max( 16.GB * task.attempt, 'memory' ) }
}
withName: 'multiqc|get_software_versions' {
memory = { check_max( 2.GB * task.attempt, 'memory' ) }
cache = false
}
Expand Down
1 change: 1 addition & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,5 @@ params {
// Genome references
fasta = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genome.fa'
gtf = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genes.gtf'
transcriptome = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/transcriptome.fasta'
}
Loading