Skip to content

Commit

Permalink
Apply suggestions from code review, update release date
Browse files Browse the repository at this point in the history
  • Loading branch information
jasmezz committed Feb 28, 2025
1 parent 8112054 commit bfa4377
Show file tree
Hide file tree
Showing 11 changed files with 67 additions and 62 deletions.
7 changes: 2 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
# This workflow runs the pipeline with the minimal test dataset to check that it completes without any syntax errors
name: nf-core CI
on:
push:
branches:
- "dev"
pull_request:
branches:
- "dev"
Expand All @@ -16,7 +13,7 @@ env:
NXF_ANSI_LOG: false
NXF_SINGULARITY_CACHEDIR: ${{ github.workspace }}/.singularity
NXF_SINGULARITY_LIBRARYDIR: ${{ github.workspace }}/.singularity
NFTEST_VER: "0.9.0"
NFTEST_VER: "0.9.2"

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
Expand Down Expand Up @@ -54,7 +51,7 @@ jobs:
- "singularity"
test_name:
- "test"
- "test_nothing"
- "test_minimal"
- "test_bakta"
- "test_prokka"
- "test_bgc_pyrodigal"
Expand Down
18 changes: 9 additions & 9 deletions .github/workflows/download_pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,6 @@ on:
required: true
default: "dev"
pull_request:
types:
- opened
- edited
- synchronize
branches:
- main
- master
pull_request_target:
branches:
- main
- master
Expand Down Expand Up @@ -120,6 +112,7 @@ jobs:
echo "IMAGE_COUNT_AFTER=$image_count" >> "$GITHUB_OUTPUT"
- name: Compare container image counts
id: count_comparison
run: |
if [ "${{ steps.count_initial.outputs.IMAGE_COUNT_INITIAL }}" -ne "${{ steps.count_afterwards.outputs.IMAGE_COUNT_AFTER }}" ]; then
initial_count=${{ steps.count_initial.outputs.IMAGE_COUNT_INITIAL }}
Expand All @@ -131,4 +124,11 @@ jobs:
exit 1
else
echo "The pipeline can be downloaded successfully!"
fi
fi{% endraw %}
- name: Upload Nextflow logfile for debugging purposes
uses: actions/upload-artifact@v4
with:
name: nextflow_logfile.txt
path: .nextflow.log*
include-hidden-files: true{% endif %}
3 changes: 0 additions & 3 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,6 @@ name: nf-core linting
# It runs the `nf-core pipelines lint` and markdown lint tests to ensure
# that the code meets the nf-core guidelines.
on:
push:
branches:
- dev
pull_request:
release:
types: [published]
Expand Down
9 changes: 3 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,15 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v2.1.0 - [2025-02-14]
## v2.1.0 - [2025-02-28]

### `Added`

- [#421](https://github.com/nf-core/funcscan/pull/421) Updated to nf-core template 3.0.2. (by @jfy133)
- [#421](https://github.com/nf-core/funcscan/pull/421), [#429](https://github.com/nf-core/funcscan/pull/429), [#433](https://github.com/nf-core/funcscan/pull/433), [#438](https://github.com/nf-core/funcscan/pull/438), [#441](https://github.com/nf-core/funcscan/pull/441) Updated to nf-core template 3.0.2, 3.1.0, 3.1.1, 3.1.2, and 3.2.0. (by @jfy133 and @jasmezz)
- [#427](https://github.com/nf-core/funcscan/pull/427) AMPcombi now can use multiple other databases for classifications. (by @darcy220606)
- [#428](https://github.com/nf-core/funcscan/pull/428) Added InterProScan annotation workflow to the pipeline. The results are coupled to AMPcombi final table. (by @darcy220606)
- [#429](https://github.com/nf-core/funcscan/pull/429) Updated to nf-core template 3.1.0. (by @jfy133 and @jasmezz)
- [#433](https://github.com/nf-core/funcscan/pull/433) Updated to nf-core template 3.1.1. (by @jfy133)
- [#431](https://github.com/nf-core/funcscan/pull/431) Updated AMPcombi, Macrel, all MMseqs2 modules, MultiQC, Pyrodigal, and seqkit, added `--taxa_classification_mmseqs_compressed` parameter. (by @jasmezz)
- [#438](https://github.com/nf-core/funcscan/pull/438) Updated to nf-core template 3.1.2. (by @jfy133)
- [#441](https://github.com/nf-core/funcscan/pull/441) Updated to nf-core template 3.2.0, updated MultiQC. (by @jasmezz and @jfy133)
- [#441](https://github.com/nf-core/funcscan/pull/441) Updated MultiQC. (by @jasmezz and @jfy133)
- [#440](https://github.com/nf-core/funcscan/pull/440) Updated Bakta and introduced new parameter `--annotation_bakta_hmms`. (by @jasmezz)

### `Fixed`
Expand Down
6 changes: 3 additions & 3 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,21 @@
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.(fasta|fas|fna|fa)(\\.gz)?$",
"errorMessage": "Fasta file for reads must be provided, cannot contain spaces and must have extension `.fa.gz`, `.fna.gz` or `.fasta.gz`"
"errorMessage": "Fasta file for reads must be provided, cannot contain spaces and must have extension `.fa`, `.fa.gz`, `.fna`, `.fna.gz`, `.fasta`, or `.fasta.gz`"
},
"protein": {
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.(faa|fasta)(\\.gz)?$",
"errorMessage": "Input file for peptide annotations has incorrect file format. File must end in `.fasta` or `.faa`"
"errorMessage": "Input file for peptide annotations has incorrect file format. File must end in `.fasta`, `.fasta.gz`, `.faa`, or `.faa.gz`"
},
"gbk": {
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.(gbk|gbff)(\\.gz)?$",
"errorMessage": "Input file for feature annotations has incorrect file format. File must end in `.gbk.gz` or `.gbff.gz`"
"errorMessage": "Input file for feature annotations has incorrect file format. File must end in `.gbk`, `.gbk.gz` or `.gbff`, or `.gbff.gz`"
}
},
"required": ["sample", "fasta"],
Expand Down
4 changes: 2 additions & 2 deletions conf/test_nothing.config → conf/test_minimal.config
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
Although in this case we turn everything off
Use as follows:
nextflow run nf-core/funcscan -profile test_nothing,<docker/singularity> --outdir <OUTDIR>
nextflow run nf-core/funcscan -profile test_minimal,<docker/singularity> --outdir <OUTDIR>
----------------------------------------------------------------------------------------
*/
Expand All @@ -21,7 +21,7 @@ process {
}

params {
config_profile_name = 'Test nothing profile'
config_profile_name = 'Minimal test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'

// Input data
Expand Down
44 changes: 25 additions & 19 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,42 +18,42 @@ The directories listed below will be created in the results directory (specified

```tree
results/
├── taxonomic_classification/
| └── mmseqs_createtsv/
├── annotation/
| ├── bakta/
| ├── prodigal/
| ├── prokka/
| └── pyrodigal/
├── protein_annotation/
| └── interproscan/
├── amp/
| ├── ampir/
| ├── amplify/
| ├── hmmsearch/
| └── macrel/
├── annotation/
| ├── bakta/
| ├── prodigal/
| ├── prokka/
| └── pyrodigal/
├── arg/
| ├── abricate/
| ├── amrfinderplus/
| ├── argnorm/
| ├── deeparg/
| ├── fargene/
| ├── rgi/
| ├── hamronization/
| └── argnorm/
| └── rgi/
├── bgc/
| ├── antismash/
| ├── deepbgc/
| ├── gecco/
| └── hmmsearch/
├── databases/
├── multiqc/
├── pipeline_info/
├── protein_annotation/
| └── interproscan/
├── qc/
| └── seqkit/
├── reports/
| ├── ampcombi/
| ├── combgc/
| └── hamronization_summarize/
├── databases/
├── multiqc/
└── pipeline_info/
└── taxonomic_classification/
└── mmseqs_createtsv/
work/
```

Expand Down Expand Up @@ -250,7 +250,7 @@ Output Summaries:

- `ampir/`
- `<samplename>.ampir.faa`: predicted AMP sequences in FAA format
- `<samplename>.ampir.tsv`: predicted AMP metadata in TSV format, contains contig name, sequence and probability score
- `<samplename>.ampir.tsv`: predicted AMP metadata in TSV format; contains contig name, sequence and probability score.

</details>

Expand All @@ -262,7 +262,7 @@ Output Summaries:
<summary>Output files</summary>

- `amplify/`
- `*_results.tsv`: table of contig amino-acid sequences with prediction result (AMP or non-AMP) and information on sequence length, charge, probability score, AMPlify log-scaled score)
- `*_results.tsv`: table of contig amino-acid sequences with prediction result (AMP or non-AMP) and information on sequence length, charge, probability score, AMPlify log-scaled score

</details>

Expand Down Expand Up @@ -490,7 +490,10 @@ Note that filtered FASTA is only used for BGC workflow for run-time optimisation
- `<sample>/*_mmseqs_matches.txt*`: alignment file generated by MMseqs2 for each sample

:::info
In some cases when the AMP and the taxonomic classification subworkflows are turned on, it can happen that only summary files per sample are created in the output folder with **no** `Ampcombi_summary.tsv` and `Ampcombi_summary_cluster.tsv` files with no taxonomic classifications merged. This can occur if some AMP prediction parameters are 'too strict' or only one AMP tool is run, which can lead to no AMP hits found in any of the samples or in only one sample. Look out for the warning `[nf-core/funcscan] AMPCOMBI2: 0/1 file passed. Skipping AMPCOMBI2_COMPLETE, AMPCOMBI2_CLUSTER, and TAXONOMY MERGING steps.` in the stdout or `.nextflow.log` file. In that case we recommend to lower the AMP prediction thresholds and run more than one AMP prediction tool.
In some cases when the AMP and the taxonomic classification subworkflows are turned on, it can happen that only summary files per sample are created in the output folder with **no** `Ampcombi_summary.tsv` and `Ampcombi_summary_cluster.tsv` files with no taxonomic classifications merged.
This can occur if some AMP prediction parameters are 'too strict' or only one AMP tool is run, which can lead to no AMP hits found in any of the samples or in only one sample.
Look out for the warning `[nf-core/funcscan] AMPCOMBI2: 0/1 file passed. Skipping AMPCOMBI2_COMPLETE, AMPCOMBI2_CLUSTER, and TAXONOMY MERGING steps.` in the stdout or `.nextflow.log` file.
In that case we recommend to lower the AMP prediction thresholds and run more than one AMP prediction tool.
:::

<summary>AMP summary table header descriptions using DRAMP as reference database</summary>
Expand Down Expand Up @@ -538,7 +541,10 @@ In some cases when the AMP and the taxonomic classification subworkflows are tur

</details>

[AMPcombi](https://github.com/Darcy220606/AMPcombi) summarizes the results of **antimicrobial peptide (AMP)** prediction tools (ampir, AMPlify, Macrel, and other non-nf-core supported tools) into a single table and aligns the hits against a reference AMP database for functional, structural and taxonomic classification using [MMseqs2](https://github.com/soedinglab/MMseqs2). It further assigns the physiochemical properties (e.g. hydrophobicity, molecular weight) using the [Biopython toolkit](https://github.com/biopython/biopython) and clusters the resulting AMP hits from all samples using [MMseqs2](https://github.com/soedinglab/MMseqs2). To further filter the recovered AMPs using the presence of signaling peptides, the output file `Ampcombi_summary_cluster.tsv` or `ampcombi_complete_summary_taxonomy.tsv.gz` can be used downstream as detailed [here](https://ampcombi.readthedocs.io/en/main/usage.html#signal-peptide). The final tables generated may also be visualized and explored using an interactive [user interface](https://ampcombi.readthedocs.io/en/main/visualization.html).
[AMPcombi](https://github.com/Darcy220606/AMPcombi) summarizes the results of **antimicrobial peptide (AMP)** prediction tools (ampir, AMPlify, Macrel, and other non-nf-core supported tools) into a single table and aligns the hits against a reference AMP database for functional, structural and taxonomic classification using [MMseqs2](https://github.com/soedinglab/MMseqs2).
It further assigns the physiochemical properties (e.g. hydrophobicity, molecular weight) using the [Biopython toolkit](https://github.com/biopython/biopython) and clusters the resulting AMP hits from all samples using [MMseqs2](https://github.com/soedinglab/MMseqs2).
To further filter the recovered AMPs using the presence of signaling peptides, the output file `Ampcombi_summary_cluster.tsv` or `ampcombi_complete_summary_taxonomy.tsv.gz` can be used downstream as detailed [here](https://ampcombi.readthedocs.io/en/main/usage.html#signal-peptide).
The final tables generated may also be visualized and explored using an interactive [user interface](https://ampcombi.readthedocs.io/en/main/visualization.html).

<img src="https://raw.githubusercontent.com/Darcy220606/AMPcombi/main/docs/ampcombi_interface_screenshot2.png" alt="AMPcombi interface" width="650" height="300">

Expand Down Expand Up @@ -675,7 +681,7 @@ argNorm takes the outputs of the [hAMRonization](#hamronization) tool of [ABRica

[MultiQC](http://multiqc.info) is used in nf-core/funcscan to report the versions of all software used in the given pipeline run, and provides a suggested methods text. This allows for reproducible analysis and transparency in method reporting in publications.

Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see <http://multiqc.info>.
Results generated by MultiQC collate pipeline QC from supported tools. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see <http://multiqc.info>.

### Pipeline information

Expand Down
19 changes: 12 additions & 7 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,11 +134,12 @@ MMseqs2 is currently the only taxonomic classification tool used in the pipeline

### InterProScan

[InterProScan](https://github.com/ebi-pf-team/interproscan) is currently the only protein annotation tool that gives a snapshot of the protein families and domains for each coding region.
[InterProScan](https://github.com/ebi-pf-team/interproscan) is currently the only protein annotation tool in this pipeline that gives a snapshot of the protein families and domains for each coding region.

The protein annotation workflow is activated with the flag `--run_protein_annotation`. InterProScan is used as the only protein annotation tool at the moment and the [InterPro database](http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.72-103.0) version 5.72-103.0 is downloaded and prepared to screen the input sequences against it.
The protein annotation workflow is activated with the flag `--run_protein_annotation`.
InterProScan is used as the only protein annotation tool at the moment and the [InterPro database](http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.72-103.0) version 5.72-103.0 is downloaded and prepared to screen the input sequences against it.

Since the database download is huge (5.5GB) and might take quite some time, you can skip the automatic database download on each run by manually downloading and extracting the files of any [InterPro version](http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/) beforehand and providing the resulting directory path to `--protein_annotation_interproscan_db <path/to/interprodatabase>`.
Since the database download is huge (5.5GB) and might take quite some time, you can skip the automatic database download (see section [Databases and reference files](usage/#interproscan-1) for details).

:::info
By default, the databases used by InterProScan is set as `PANTHER,ProSiteProfiles,ProSitePatterns,Pfam`. An addition of other application to the list does not guarantee that the results will be integrated correctly within `AMPcombi`.
Expand Down Expand Up @@ -245,7 +246,7 @@ This can then be passed to the pipeline with:
The contents of the directory should have files such as `*.fasta` and `*.tsv` in the top level; a fasta file and the corresponding table with structural, functional and (if reported) taxonomic classifications. AMPcombi will then generate the corresponding `mmseqs2` directory, in which all binary files are prepared for downstream alignment of the recovered AMPs with [MMseqs2](https://github.com/soedinglab/MMseqs2). These can also be provided by the user by setting up an MMseqs2-compatible database using `mmseqs createdb *.fasta` in a directory called `mmseqs2`. An example file structure for [DRAMP](http://dramp.cpu-bioinfor.org/) used as the reference database:
```bash
```tree
amp_DRAMP_database/
├── general_amps_2024_11_13.fasta
├── general_amps_2024_11_13.txt
Expand All @@ -260,7 +261,7 @@ amp_DRAMP_database/
└── ref_DB.source
```
:::note{.fa-whale}
:::note
For both [DRAMP](http://dramp.cpu-bioinfor.org/) and [APD](https://aps.unmc.edu/), AMPcombi removes entries that contain any non-amino acid residues by default.
:::
Expand Down Expand Up @@ -511,7 +512,7 @@ deepbgc download
You can then indicate the path to the database folder in the pipeline with `--bgc_deepbgc_db <path>/<to>/<deepbgc_db>/`.
The contents of the database directory should include directories such as `common`, `0.1.0` in the top level:

```console
```tree
deepbgc_db/
├── common
└── <version-num>[0.1.0]
Expand All @@ -523,7 +524,11 @@ deepbgc_db/

### InterProScan

[InterProScan](https://github.com/ebi-pf-team/interproscan) is used to provide more information about the proteins annotated on the contigs. By default, turning on this subworkflow with `--run_protein_annotation` will download and unzip the [InterPro database](http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.72-103.0/) version 5.72-103.0. The database can be saved in the output directory `<output_directory>/databases/interproscan/` if the `--save_db` is turned on. Note: the huge database download (5.5GB) can take up to 4 hours depending on the bandwidth.
[InterProScan](https://github.com/ebi-pf-team/interproscan) is used to provide more information about the proteins annotated on the contigs. By default, turning on this subworkflow with `--run_protein_annotation` will download and unzip the [InterPro database](http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.72-103.0/) version 5.72-103.0. The database can be saved in the output directory `<output_directory>/databases/interproscan/` if the `--save_db` is turned on.

:::note
The huge database download (5.5GB) can take up to 4 hours depending on the bandwidth.
:::

A local version of the database can be supplied to the pipeline by passing the InterProScan database directory to `--protein_annotation_interproscan_db <path/to/downloaded-untarred-interproscan_db-dir/>`. The directory can be created by running (e.g. for database version 5.72-103.0):

Expand Down
2 changes: 1 addition & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -526,7 +526,7 @@ manifest {
description = """Pipeline for screening for functional components of assembled contigs"""
mainScript = 'main.nf'
defaultBranch = 'master'
nextflowVersion = '!>=24.04.2'
nextflowVersion = '!>=24.10.2'
version = '2.1.0'
doi = '10.5281/zenodo.7643099'
}
Expand Down
Loading

0 comments on commit bfa4377

Please sign in to comment.