Skip to content

Commit

Permalink
Merge pull request #361 from nf-core/contig-separation
Browse files Browse the repository at this point in the history
Contig separation
  • Loading branch information
jasmezz authored Apr 17, 2024
2 parents 6d1f069 + d830fc0 commit 3b25ab8
Show file tree
Hide file tree
Showing 31 changed files with 1,154 additions and 462 deletions.
12 changes: 6 additions & 6 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,6 @@

> Schwengers, O., Jelonek, L., Dieckmann, M. A., Beyvers, S., Blom, J., & Goesmann, A. (2021). Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microbial Genomics, 7(11). [DOI: 10.1099/mgen.0.000685](https://doi.org/10.1099/mgen.0.000685)
- [bioawk](https://github.com/lh3/bioawk)

> Li, H. (2023). bioawk: BWK awk modified for biological data. Github. Retrieved July 12, 2023, from https://github.com/lh3/bioawk
- [comBGC](https://github.com/nf-core/funcscan)

> Frangenberg, J., Fellows Yates, J. A., Ibrahim, A., Perelo, L., & Beber, M. E. (2023). nf-core/funcscan: 1.0.0 - German Rollmops - 2023-02-15. https://doi.org/10.5281/zenodo.7643100
Expand Down Expand Up @@ -74,6 +70,10 @@

> Santos-Júnior, C. D., Pan, S., Zhao, X. M., & Coelho, L. P. (2020). Macrel: antimicrobial peptide screening in genomes and metagenomes. PeerJ, 8, e10555. [DOI: 10.7717/peerj.10555](https://doi.org/10.7717/peerj.10555)
- [MMseqs2](https://doi.org/10.1093/bioinformatics/btab184)

> Mirdita, M., Steinegger, M., Breitwieser, F., Söding, J., Levy Karin, E. (2021). Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics, 37(18),3029–3031. [DOI: 10.1093/bioinformatics/btab184](https://doi.org/10.1093/bioinformatics/btab184)
- [Prodigal](https://doi.org/10.1186/1471-2105-11-119)

> Hyatt, D., Chen, G. L., Locascio, P. F., Land, M. L., Larimer, F. W., & Hauser, L. J. (2010). Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC bioinformatics, 11, 119. [DOI: 10.1186/1471-2105-11-119](https://doi.org/10.1186/1471-2105-11-119)
Expand All @@ -90,9 +90,9 @@

> Alcock, B. P., Huynh, W., Chalil, R., Smith, K. W., Raphenya, A. R., Wlodarski, M. A., Edalatmand, A., Petkau, A., Syed, S. A., Tsang, K. K., Baker, S. J. C., Dave, M., McCarthy, M. C., Mukiri, K. M., Nasir, J. A., Golbon, B., Imtiaz, H., Jiang, X., Kaur, K., Kwong, M., Liang, Z. C., Niu, K. C., Shan, P., Yang, J. Y. J., Gray, K. L., Hoad, G. R., Jia, B., Bhando, T., Carfrae, L. A., Farha, M. A., French, S., Gordzevich, R., Rachwalski, K., Tu, M. M., Bordeleau, E., Dooley, D., Griffiths, E., Zubyk, H. L., Brown, E. D., Maguire, F., Beiko, R. G., Hsiao, W. W. L., Brinkman F. S. L., Van Domselaar, G., McArthur, A. G. (2023). CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic acids research, 51(D1):D690-D699. [DOI: 10.1093/nar/gkac920](https://doi.org/10.1093/nar/gkac920)
- [MMseqs2](https://doi.org/10.1093bioinformatics/btab184)
- [SeqKit](https://bioinf.shenwei.me/seqkit/)

> Mirdita, M., Steinegger, M., Breitwieser, F., Söding, J., Levy Karin, E. (2021). Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics, 37(18),3029–3031. [DOI: 10.1093/bioinformatics/btab184](https://doi.org/10.1093/bioinformatics/btab184)
> Shen, W., Sipos, B., & Zhao, L. (2024). SeqKit2: A Swiss army knife for sequence and alignment processing. iMeta, e191. [https://doi.org/10.1002/imt2.191](https://doi.org/10.1002/imt2.191)
## Software packaging/containerisation tools

Expand Down
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,14 @@ The nf-core/funcscan AWS full test dataset are contigs generated by the MGnify s

## Pipeline summary

1. Taxonomic classification of contigs of **prokaryotic origin** with [`MMseqs2`](https://github.com/soedinglab/MMseqs2)
2. Annotation of assembled prokaryotic contigs with [`Prodigal`](https://github.com/hyattpd/Prodigal), [`Pyrodigal`](https://github.com/althonos/pyrodigal), [`Prokka`](https://github.com/tseemann/prokka), or [`Bakta`](https://github.com/oschwengers/bakta)
3. Screening contigs for antimicrobial peptide-like sequences with [`ampir`](https://cran.r-project.org/web/packages/ampir/index.html), [`Macrel`](https://github.com/BigDataBiology/macrel), [`HMMER`](http://hmmer.org/), [`AMPlify`](https://github.com/bcgsc/AMPlify)
4. Screening contigs for antibiotic resistant gene-like sequences with [`ABRicate`](https://github.com/tseemann/abricate), [`AMRFinderPlus`](https://github.com/ncbi/amr), [`fARGene`](https://github.com/fannyhb/fargene), [`RGI`](https://card.mcmaster.ca/analyze/rgi), [`DeepARG`](https://bench.cs.vt.edu/deeparg)
5. Screening contigs for biosynthetic gene cluster-like sequences with [`antiSMASH`](https://antismash.secondarymetabolites.org), [`DeepBGC`](https://github.com/Merck/deepbgc), [`GECCO`](https://gecco.embl.de/), [`HMMER`](http://hmmer.org/)
6. Creating aggregated reports for all samples across the workflows with [`AMPcombi`](https://github.com/Darcy220606/AMPcombi) for AMPs, [`hAMRonization`](https://github.com/pha4ge/hAMRonization) for ARGs, and [`comBGC`](https://raw.githubusercontent.com/nf-core/funcscan/master/bin/comBGC.py) for BGCs
7. Software version and methods text reporting with [`MultiQC`](http://multiqc.info/)
1. Quality control of input sequences with [`SeqKit`](https://bioinf.shenwei.me/seqkit/)
2. Taxonomic classification of contigs of **prokaryotic origin** with [`MMseqs2`](https://github.com/soedinglab/MMseqs2)
3. Annotation of assembled prokaryotic contigs with [`Prodigal`](https://github.com/hyattpd/Prodigal), [`Pyrodigal`](https://github.com/althonos/pyrodigal), [`Prokka`](https://github.com/tseemann/prokka), or [`Bakta`](https://github.com/oschwengers/bakta)
4. Screening contigs for antimicrobial peptide-like sequences with [`ampir`](https://cran.r-project.org/web/packages/ampir/index.html), [`Macrel`](https://github.com/BigDataBiology/macrel), [`HMMER`](http://hmmer.org/), [`AMPlify`](https://github.com/bcgsc/AMPlify)
5. Screening contigs for antibiotic resistant gene-like sequences with [`ABRicate`](https://github.com/tseemann/abricate), [`AMRFinderPlus`](https://github.com/ncbi/amr), [`fARGene`](https://github.com/fannyhb/fargene), [`RGI`](https://card.mcmaster.ca/analyze/rgi), [`DeepARG`](https://bench.cs.vt.edu/deeparg)
6. Screening contigs for biosynthetic gene cluster-like sequences with [`antiSMASH`](https://antismash.secondarymetabolites.org), [`DeepBGC`](https://github.com/Merck/deepbgc), [`GECCO`](https://gecco.embl.de/), [`HMMER`](http://hmmer.org/)
7. Creating aggregated reports for all samples across the workflows with [`AMPcombi`](https://github.com/Darcy220606/AMPcombi) for AMPs, [`hAMRonization`](https://github.com/pha4ge/hAMRonization) for ARGs, and [`comBGC`](https://raw.githubusercontent.com/nf-core/funcscan/master/bin/comBGC.py) for BGCs
8. Software version and methods text reporting with [`MultiQC`](http://multiqc.info/)

![funcscan metro workflow](docs/images/funcscan_metro_workflow.png)

Expand Down
4 changes: 0 additions & 4 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,6 @@ process {
cpus = 1
}

withName:BIOAWK {
cache = false
}

withName: PROKKA {
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
cpus = { check_max( 4 * task.attempt, 'cpus' ) }
Expand Down
25 changes: 20 additions & 5 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,30 @@ process {
]
}

withName: BIOAWK {
ext.args = "-c fastx \'{print \">\" \$name ORS length(\$seq)}\'"
withName: SEQKIT_SEQ_LONG {
ext.prefix = { "${meta.id}_long" }
publishDir = [
path: { "${params.outdir}/" },
path: { "${params.outdir}/qc/seqkit/" },
mode: params.publish_dir_mode,
enabled: false,
enabled: params.contig_qc_savesplitfastas,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
ext.prefix = { "${meta.id}.fa" }
ext.args = [
"--min-len ${params.contig_qc_lengththreshold}"
].join(' ').trim()
}

withName: SEQKIT_SEQ_SHORT {
ext.prefix = { "${meta.id}_short" }
publishDir = [
path: { "${params.outdir}/qc/seqkit/" },
mode: params.publish_dir_mode,
enabled: params.contig_qc_savesplitfastas,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
ext.args = [
"--max-len ${params.contig_qc_lengththreshold - 1}"
].join(' ').trim()
}

withName: MMSEQS_DATABASES {
Expand Down
Binary file modified docs/images/funcscan_metro_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 3b25ab8

Please sign in to comment.