Skip to content

Commit

Permalink
Merge pull request #359 from nf-core/card-db
Browse files Browse the repository at this point in the history
Improve RGI database handling
  • Loading branch information
jasmezz authored Apr 10, 2024
2 parents 229311d + ec3ebab commit 6d1f069
Show file tree
Hide file tree
Showing 6 changed files with 63 additions and 15 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#332](https://github.com/nf-core/funcscan/pull/332) & [#327](https://github.com/nf-core/funcscan/pull/327) Merged pipeline template of nf-core/tools version 2.12.1 (by @jfy133, @jasmezz)
- [#338](https://github.com/nf-core/funcscan/pull/338) Set `--meta` parameter to default for Bakta, with singlemode optional. (by @jasmezz)
- [#343](https://github.com/nf-core/funcscan/pull/343) Added contig taxonomic classification using [MMseqs2](https://github.com/soedinglab/MMseqs2/). (by @darcy220606)
- [#358](https://github.com/nf-core/funcscan/pull/358) Improved RGI databases handling, users can supply their own CARD now. (by @jasmezz)

### `Fixed`

Expand Down
16 changes: 15 additions & 1 deletion conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -302,10 +302,24 @@ process {
ext.args = params.arg_fargene_orffinder ? '--orf-finder' : ''
}

withName:UNTAR_CARD {

ext.prefix = "card_database"
publishDir = [
[
path: { "${params.outdir}/databases/rgi" },
mode: params.publish_dir_mode,
enabled: params.save_databases,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
]

}

withName:RGI_CARDANNOTATION {
publishDir = [
[
path: { "${params.outdir}/databases/card" },
path: { "${params.outdir}/databases/rgi" },
mode: params.publish_dir_mode,
enabled: params.save_databases,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
Expand Down
17 changes: 16 additions & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,21 @@ with the version number so hAMRonization will correctly display the database ver

> ℹ️ The flag `--save_databases` saves the pipeline-downloaded databases in your results directory. You can then move these to a central cache directory of your choice for re-use in the future.
### RGI

RGI requires the database CARD which can be downloaded by nf-core/funcscan or supplied by the user manually. To download and supply the database yourself, do:

1. Download [CARD](https://card.mcmaster.ca/latest/data)
2. Extract the archive.

You can then supply the path to resulting database directory with:

```bash
--arg_rgi_database '/<path>/<to>/<card>/'
```

> ℹ️ The flag `--save_databases` saves the pipeline-downloaded databases in your results directory. You can then move these to a central cache directory of your choice for re-use in the future.
### antiSMASH

antiSMASH requires several databases for the detection of potential biosynthetic gene cluster (BGC) sequences (ClusterBlast, MIBiG, Pfam, Resfams, TIGRFAMs).
Expand All @@ -243,7 +258,7 @@ Note that the names of the supplied folders must differ from each other (e.g. `a

> ℹ️ The flag `--save_databases` saves the pipeline-downloaded databases in your results directory. You can then move these to a central cache directory of your choice for re-use in the future.
> ℹ️ If installing with conda, the installation directory will be `lib/python3.8/site-packages/antismash` from the base directory of your conda install or conda environment directory.
> ℹ️ If installing with conda, the installation directory will be `lib/python3.10/site-packages/antismash` from the base directory of your conda install or conda environment directory.
### DeepBGC

Expand Down
1 change: 1 addition & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ params {
arg_fargene_orffinder = false

arg_skip_rgi = false
arg_rgi_database = null
arg_rgi_savejson = false
arg_rgi_savetmpfiles = false
arg_rgi_alignmenttool = 'BLAST'
Expand Down
22 changes: 14 additions & 8 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -847,6 +847,12 @@
"description": "Skip RGI during the ARG-screening.",
"fa_icon": "fas fa-ban"
},
"arg_rgi_database": {
"type": "string",
"description": "Path to user-defined local CARD database.",
"fa_icon": "fas fa-layer-group",
"help_text": "You can pre-download the CARD database to your machine and pass the path of it to this parameter.\n\nSee the pipeline [documentation](https://nf-co.re/funcscan/usage#rgi) for details on how to download this.\n\n> Modifies tool parameter(s):\n> - RGI_CARDANNOTATION: `--input`"
},
"arg_rgi_savejson": {
"type": "boolean",
"description": "Save RGI output .json file.",
Expand All @@ -865,43 +871,43 @@
"type": "string",
"default": "BLAST",
"description": "Specify the alignment tool to be used.",
"help_text": "Specifies the alignment tool to be used. By default RGI runs BLAST and this is also set as default in the nf-core/funcscan pipeline. Using this flag the user can activate the alignment by DIAMOND again.\n\nFor more information check RGI [documentation](https://github.com/arpcard/rgi).\n\n> Modifies tool parameter(s):\n> - RGI: `--alignment_tool`",
"help_text": "Specifies the alignment tool to be used. By default RGI runs BLAST and this is also set as default in the nf-core/funcscan pipeline. Using this flag the user can activate the alignment by DIAMOND again.\n\nFor more information check RGI [documentation](https://github.com/arpcard/rgi).\n\n> Modifies tool parameter(s):\n> - RGI_MAIN: `--alignment_tool`",
"enum": ["BLAST", "DIAMOND"],
"fa_icon": "fas fa-align-justify"
},
"arg_rgi_includeloose": {
"type": "boolean",
"description": "Include all of loose, strict and perfect hits (i.e. >=95% identity) found by RGI.",
"help_text": "When activated it includes 'Loose' hits (a.k.a. Discovery) in addition to strict and perfect hits. All 'Loose' matches of 95% identity or better are automatically listed as 'Strict', regardless of alignment length (RGI v. <6.0.0). This behaviour can be overrun by using the --include_nudge flag. The 'Loose' algorithm works outside of the detection model cut-offs to provide detection of new, emergent threats and more distant homologs of AMR genes, but will also catalog homologous sequences and spurious partial matches that may not have a role in AMR.\n\nFor more information check RGI [documentation](https://github.com/arpcard/rgi).\n\n> Modifies tool parameter(s):\n> - RGI: `--include_loose`",
"help_text": "When activated it includes 'Loose' hits (a.k.a. Discovery) in addition to strict and perfect hits. All 'Loose' matches of 95% identity or better are automatically listed as 'Strict', regardless of alignment length (RGI v. <6.0.0). This behaviour can be overrun by using the --include_nudge flag. The 'Loose' algorithm works outside of the detection model cut-offs to provide detection of new, emergent threats and more distant homologs of AMR genes, but will also catalog homologous sequences and spurious partial matches that may not have a role in AMR.\n\nFor more information check RGI [documentation](https://github.com/arpcard/rgi).\n\n> Modifies tool parameter(s):\n> - RGI_MAIN: `--include_loose`",
"fa_icon": "far fa-hand-scissors",
"default": false
},
"arg_rgi_includenudge": {
"type": "boolean",
"description": "Suppresses the default behaviour of RGI with `--arg_rgi_includeloose`.",
"help_text": "This flag suppresses the default behaviour of RGI with `--include_loose`, which lists all 'Loose' matches of >= 95% identity as 'Strict', regardless of alignment length. With this strict and perfect labels are added. This is discontinued in future versions of RGI.\n\nFor more information check RGI [documentation](https://github.com/arpcard/rgi).\n\n> Modifies tool parameter(s):\n> - RGI: `--include_nudge`",
"help_text": "This flag suppresses the default behaviour of RGI with `--include_loose`, which lists all 'Loose' matches of >= 95% identity as 'Strict', regardless of alignment length. With this strict and perfect labels are added. This is discontinued in future versions of RGI.\n\nFor more information check RGI [documentation](https://github.com/arpcard/rgi).\n\n> Modifies tool parameter(s):\n> - RGI_MAIN: `--include_nudge`",
"fa_icon": "fas fa-hand-scissors",
"default": false
},
"arg_rgi_lowquality": {
"type": "boolean",
"description": "Include screening of low quality contigs for partial genes.",
"help_text": "This flag should be used only when the contigs are of poor quality (e.g. short) to predict partial genes.\n\nFor more information check RGI [documentation](https://github.com/arpcard/rgi).\n\n> Modifies tool parameter(s):\n> - RGI: `--low_quality`",
"help_text": "This flag should be used only when the contigs are of poor quality (e.g. short) to predict partial genes.\n\nFor more information check RGI [documentation](https://github.com/arpcard/rgi).\n\n> Modifies tool parameter(s):\n> - RGI_MAIN: `--low_quality`",
"fa_icon": "fas fa-angle-double-down",
"default": false
},
"arg_rgi_data": {
"type": "string",
"default": "NA",
"description": "Specify a more specific data-type of input (e.g. plasmid, chromosome)",
"help_text": "This flag is used to specify the data type used as input to RGI. By default this is set as 'NA', which makes no assumptions on input data.\n\nFor more information check RGI [documentation](https://github.com/arpcard/rgi).\n\n> Modifies tool parameter(s):\n> - RGI: `--data`",
"help_text": "This flag is used to specify the data type used as input to RGI. By default this is set as 'NA', which makes no assumptions on input data.\n\nFor more information check RGI [documentation](https://github.com/arpcard/rgi).\n\n> Modifies tool parameter(s):\n> - RGI_MAIN: `--data`",
"enum": ["NA", "wgs", "plasmid", "chromosome"],
"fa_icon": "fas fa-database"
},
"arg_rgi_split_prodigal_jobs": {
"type": "boolean",
"description": "Run multiple prodigal jobs simultaneously for contigs in a fasta file.",
"help_text": "Modifies tool parameter:\n> - RGI: `--split_prodigal_jobs`",
"help_text": "Modifies tool parameter:\n> - RGI_MAIN: `--split_prodigal_jobs`",
"fa_icon": "fas fa-angle-double-down",
"default": true
}
Expand Down Expand Up @@ -976,7 +982,7 @@
"type": "string",
"description": "Path to user-defined local antiSMASH database.",
"fa_icon": "fas fa-layer-group",
"help_text": "It is recommend to pre-download the antiSMASH databases to your machine and pass the path of it to this parameter, as this can take a long time to download - particularly when running lots of pipeline runs. \n\nSee the pipeline [documentation](https://nf-co.re/funcscan/usage#antismash) for details on how to download this. If running with docker or singularity, please also check `--bgc_antismash_installationdirectory` for important information."
"help_text": "It is recommend to pre-download the antiSMASH databases to your machine and pass the path of it to this parameter, as this can take a long time to download - particularly when running lots of pipeline runs. \n\nSee the pipeline [documentation](https://nf-co.re/funcscan/usage#antismash-1) for details on how to download this. If running with docker or singularity, please also check `--bgc_antismash_installationdirectory` for important information."
},
"bgc_antismash_installationdirectory": {
"type": "string",
Expand Down Expand Up @@ -1046,7 +1052,7 @@
}
},
"fa_icon": "fas fa-tools",
"help_text": "The antibiotics and Secondary Metabolite Analysis SHell (antiSMASH) carries out a genome-wide screening, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genomes. \n\nDocumentation: https://antismash.secondarymetabolites.org/#!/about"
"help_text": "The antibiotics and Secondary Metabolite Analysis SHell (antiSMASH) carries out a genome-wide screening, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genomes. \n\nDocumentation: https://antismash.secondarymetabolites.org/#!/about"
},
"bgc_deepbgc": {
"title": "BGC: deepBGC",
Expand Down
21 changes: 16 additions & 5 deletions subworkflows/local/arg.nf
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ include { HAMRONIZATION_FARGENE } from '../../modules/nf-core/hamro
include { HAMRONIZATION_SUMMARIZE } from '../../modules/nf-core/hamronization/summarize/main'
include { RGI_CARDANNOTATION } from '../../modules/nf-core/rgi/cardannotation/main'
include { RGI_MAIN } from '../../modules/nf-core/rgi/main/main'
include { UNTAR } from '../../modules/nf-core/untar/main'
include { UNTAR as UNTAR_CARD } from '../../modules/nf-core/untar/main'
include { TABIX_BGZIP as ARG_TABIX_BGZIP } from '../../modules/nf-core/tabix/bgzip/main'
include { MERGE_TAXONOMY_HAMRONIZATION } from '../../modules/local/merge_taxonomy_hamronization'

Expand Down Expand Up @@ -85,10 +85,21 @@ workflow ARG {
// RGI run
if ( !params.arg_skip_rgi ) {

// Download and prepare CARD
UNTAR ( [ [], file('https://card.mcmaster.ca/latest/data', checkIfExists: true).copyTo(params.outdir + '/databases/card/data.tar.gz') ] )
ch_versions = ch_versions.mix( UNTAR.out.versions )
RGI_CARDANNOTATION ( UNTAR.out.untar.map{ it[1] } )
if ( !params.arg_rgi_database ) {

// Download and untar CARD
UNTAR_CARD ( [ [], file('https://card.mcmaster.ca/latest/data', checkIfExists: true) ] )
ch_versions = ch_versions.mix( UNTAR_CARD.out.versions )
rgi_database = UNTAR_CARD.out.untar.map{ it[1] }

} else {

// Use user-supplied database
rgi_database = params.arg_rgi_database

}

RGI_CARDANNOTATION ( rgi_database )
ch_versions = ch_versions.mix( RGI_CARDANNOTATION.out.versions )

RGI_MAIN ( contigs, RGI_CARDANNOTATION.out.db, [] )
Expand Down

0 comments on commit 6d1f069

Please sign in to comment.