Apply suggestions from code review, update release date

nf-core · Feb 28, 2025 · bfa4377 · bfa4377
1 parent 8112054
commit bfa4377
Show file tree

Hide file tree

Showing 11 changed files with 67 additions and 62 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -1,9 +1,6 @@
 # This workflow runs the pipeline with the minimal test dataset to check that it completes without any syntax errors
 name: nf-core CI
 on:
-  push:
-    branches:
-      - "dev"
   pull_request:
     branches:
       - "dev"
@@ -16,7 +13,7 @@ env:
   NXF_ANSI_LOG: false
   NXF_SINGULARITY_CACHEDIR: ${{ github.workspace }}/.singularity
   NXF_SINGULARITY_LIBRARYDIR: ${{ github.workspace }}/.singularity
-  NFTEST_VER: "0.9.0"
+  NFTEST_VER: "0.9.2"
 
 concurrency:
   group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
@@ -54,7 +51,7 @@ jobs:
           - "singularity"
         test_name:
           - "test"
-          - "test_nothing"
+          - "test_minimal"
           - "test_bakta"
           - "test_prokka"
           - "test_bgc_pyrodigal"

diff --git a/.github/workflows/download_pipeline.yml b/.github/workflows/download_pipeline.yml
@@ -12,14 +12,6 @@ on:
         required: true
         default: "dev"
   pull_request:
-    types:
-      - opened
-      - edited
-      - synchronize
-    branches:
-      - main
-      - master
-  pull_request_target:
     branches:
       - main
       - master
@@ -120,6 +112,7 @@ jobs:
           echo "IMAGE_COUNT_AFTER=$image_count" >> "$GITHUB_OUTPUT"
 
       - name: Compare container image counts
+        id: count_comparison
         run: |
           if [ "${{ steps.count_initial.outputs.IMAGE_COUNT_INITIAL }}" -ne "${{ steps.count_afterwards.outputs.IMAGE_COUNT_AFTER }}" ]; then
             initial_count=${{ steps.count_initial.outputs.IMAGE_COUNT_INITIAL }}
@@ -131,4 +124,11 @@ jobs:
             exit 1
           else
             echo "The pipeline can be downloaded successfully!"
-          fi
+          fi{% endraw %}
+
+      - name: Upload Nextflow logfile for debugging purposes
+        uses: actions/upload-artifact@v4
+        with:
+          name: nextflow_logfile.txt
+          path: .nextflow.log*
+          include-hidden-files: true{% endif %}
diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml
@@ -3,9 +3,6 @@ name: nf-core linting
 # It runs the `nf-core pipelines lint` and markdown lint tests to ensure
 # that the code meets the nf-core guidelines.
 on:
-  push:
-    branches:
-      - dev
   pull_request:
   release:
     types: [published]

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,18 +3,15 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## v2.1.0 - [2025-02-14]
+## v2.1.0 - [2025-02-28]
 
 ### `Added`
 
-- [#421](https://github.com/nf-core/funcscan/pull/421) Updated to nf-core template 3.0.2. (by @jfy133)
+- [#421](https://github.com/nf-core/funcscan/pull/421), [#429](https://github.com/nf-core/funcscan/pull/429), [#433](https://github.com/nf-core/funcscan/pull/433), [#438](https://github.com/nf-core/funcscan/pull/438), [#441](https://github.com/nf-core/funcscan/pull/441) Updated to nf-core template 3.0.2, 3.1.0, 3.1.1, 3.1.2, and 3.2.0. (by @jfy133 and @jasmezz)
 - [#427](https://github.com/nf-core/funcscan/pull/427) AMPcombi now can use multiple other databases for classifications. (by @darcy220606)
 - [#428](https://github.com/nf-core/funcscan/pull/428) Added InterProScan annotation workflow to the pipeline. The results are coupled to AMPcombi final table. (by @darcy220606)
-- [#429](https://github.com/nf-core/funcscan/pull/429) Updated to nf-core template 3.1.0. (by @jfy133 and @jasmezz)
-- [#433](https://github.com/nf-core/funcscan/pull/433) Updated to nf-core template 3.1.1. (by @jfy133)
 - [#431](https://github.com/nf-core/funcscan/pull/431) Updated AMPcombi, Macrel, all MMseqs2 modules, MultiQC, Pyrodigal, and seqkit, added `--taxa_classification_mmseqs_compressed` parameter. (by @jasmezz)
-- [#438](https://github.com/nf-core/funcscan/pull/438) Updated to nf-core template 3.1.2. (by @jfy133)
-- [#441](https://github.com/nf-core/funcscan/pull/441) Updated to nf-core template 3.2.0, updated MultiQC. (by @jasmezz and @jfy133)
+- [#441](https://github.com/nf-core/funcscan/pull/441) Updated MultiQC. (by @jasmezz and @jfy133)
 - [#440](https://github.com/nf-core/funcscan/pull/440) Updated Bakta and introduced new parameter `--annotation_bakta_hmms`. (by @jasmezz)
 
 ### `Fixed`

diff --git a/assets/schema_input.json b/assets/schema_input.json
@@ -18,21 +18,21 @@
                 "format": "file-path",
                 "exists": true,
                 "pattern": "^\\S+\\.(fasta|fas|fna|fa)(\\.gz)?$",
-                "errorMessage": "Fasta file for reads must be provided, cannot contain spaces and must have extension `.fa.gz`, `.fna.gz` or `.fasta.gz`"
+                "errorMessage": "Fasta file for reads must be provided, cannot contain spaces and must have extension `.fa`, `.fa.gz`, `.fna`, `.fna.gz`, `.fasta`, or `.fasta.gz`"
             },
             "protein": {
                 "type": "string",
                 "format": "file-path",
                 "exists": true,
                 "pattern": "^\\S+\\.(faa|fasta)(\\.gz)?$",
-                "errorMessage": "Input file for peptide annotations has incorrect file format. File must end in `.fasta` or `.faa`"
+                "errorMessage": "Input file for peptide annotations has incorrect file format. File must end in `.fasta`, `.fasta.gz`, `.faa`, or `.faa.gz`"
             },
             "gbk": {
                 "type": "string",
                 "format": "file-path",
                 "exists": true,
                 "pattern": "^\\S+\\.(gbk|gbff)(\\.gz)?$",
-                "errorMessage": "Input file for feature annotations has incorrect file format. File must end in `.gbk.gz` or `.gbff.gz`"
+                "errorMessage": "Input file for feature annotations has incorrect file format. File must end in `.gbk`, `.gbk.gz` or `.gbff`, or `.gbff.gz`"
             }
         },
         "required": ["sample", "fasta"],

diff --git a/conf/test_nothing.config → conf/test_minimal.config b/conf/test_nothing.config → conf/test_minimal.config
@@ -7,7 +7,7 @@
     Although in this case we turn everything off
 
     Use as follows:
-        nextflow run nf-core/funcscan -profile test_nothing,<docker/singularity> --outdir <OUTDIR>
+        nextflow run nf-core/funcscan -profile test_minimal,<docker/singularity> --outdir <OUTDIR>
 
 ----------------------------------------------------------------------------------------
 */
@@ -21,7 +21,7 @@ process {
 }
 
 params {
-    config_profile_name        = 'Test nothing profile'
+    config_profile_name        = 'Minimal test profile'
     config_profile_description = 'Minimal test dataset to check pipeline function'
 
     // Input data

diff --git a/docs/output.md b/docs/output.md
@@ -18,42 +18,42 @@ The directories listed below will be created in the results directory (specified
 
 ```tree
 results/
-├── taxonomic_classification/
-|   └── mmseqs_createtsv/
-├── annotation/
-|   ├── bakta/
-|   ├── prodigal/
-|   ├── prokka/
-|   └── pyrodigal/
-├── protein_annotation/
-|   └── interproscan/
 ├── amp/
 |   ├── ampir/
 |   ├── amplify/
 |   ├── hmmsearch/
 |   └── macrel/
+├── annotation/
+|   ├── bakta/
+|   ├── prodigal/
+|   ├── prokka/
+|   └── pyrodigal/
 ├── arg/
 |   ├── abricate/
 |   ├── amrfinderplus/
+|   ├── argnorm/
 |   ├── deeparg/
 |   ├── fargene/
-|   ├── rgi/
 |   ├── hamronization/
-|   └── argnorm/
+|   └── rgi/
 ├── bgc/
 |   ├── antismash/
 |   ├── deepbgc/
 |   ├── gecco/
 |   └── hmmsearch/
+├── databases/
+├── multiqc/
+├── pipeline_info/
+├── protein_annotation/
+|   └── interproscan/
 ├── qc/
 |   └── seqkit/
 ├── reports/
 |   ├── ampcombi/
 |   ├── combgc/
 |   └── hamronization_summarize/
-├── databases/
-├── multiqc/
-└── pipeline_info/
+└── taxonomic_classification/
+    └── mmseqs_createtsv/
 work/
 ```
 
@@ -250,7 +250,7 @@ Output Summaries:
 
 - `ampir/`
   - `<samplename>.ampir.faa`: predicted AMP sequences in FAA format
-  - `<samplename>.ampir.tsv`: predicted AMP metadata in TSV format, contains contig name, sequence and probability score
+  - `<samplename>.ampir.tsv`: predicted AMP metadata in TSV format; contains contig name, sequence and probability score.
 
 </details>
 
@@ -262,7 +262,7 @@ Output Summaries:
 <summary>Output files</summary>
 
 - `amplify/`
-  - `*_results.tsv`: table of contig amino-acid sequences with prediction result (AMP or non-AMP) and information on sequence length, charge, probability score, AMPlify log-scaled score)
+  - `*_results.tsv`: table of contig amino-acid sequences with prediction result (AMP or non-AMP) and information on sequence length, charge, probability score, AMPlify log-scaled score
 
 </details>
 
@@ -490,7 +490,10 @@ Note that filtered FASTA is only used for BGC workflow for run-time optimisation
   - `<sample>/*_mmseqs_matches.txt*`: alignment file generated by MMseqs2 for each sample
 
 :::info
-In some cases when the AMP and the taxonomic classification subworkflows are turned on, it can happen that only summary files per sample are created in the output folder with **no** `Ampcombi_summary.tsv` and `Ampcombi_summary_cluster.tsv` files with no taxonomic classifications merged. This can occur if some AMP prediction parameters are 'too strict' or only one AMP tool is run, which can lead to no AMP hits found in any of the samples or in only one sample. Look out for the warning `[nf-core/funcscan] AMPCOMBI2: 0/1 file passed. Skipping AMPCOMBI2_COMPLETE, AMPCOMBI2_CLUSTER, and TAXONOMY MERGING steps.` in the stdout or `.nextflow.log` file. In that case we recommend to lower the AMP prediction thresholds and run more than one AMP prediction tool.
+In some cases when the AMP and the taxonomic classification subworkflows are turned on, it can happen that only summary files per sample are created in the output folder with **no** `Ampcombi_summary.tsv` and `Ampcombi_summary_cluster.tsv` files with no taxonomic classifications merged.
+This can occur if some AMP prediction parameters are 'too strict' or only one AMP tool is run, which can lead to no AMP hits found in any of the samples or in only one sample.
+Look out for the warning `[nf-core/funcscan] AMPCOMBI2: 0/1 file passed. Skipping AMPCOMBI2_COMPLETE, AMPCOMBI2_CLUSTER, and TAXONOMY MERGING steps.` in the stdout or `.nextflow.log` file.
+In that case we recommend to lower the AMP prediction thresholds and run more than one AMP prediction tool.
 :::
 
   <summary>AMP summary table header descriptions using DRAMP as reference database</summary>
@@ -538,7 +541,10 @@ In some cases when the AMP and the taxonomic classification subworkflows are tur
 
 </details>
 
-[AMPcombi](https://github.com/Darcy220606/AMPcombi) summarizes the results of **antimicrobial peptide (AMP)** prediction tools (ampir, AMPlify, Macrel, and other non-nf-core supported tools) into a single table and aligns the hits against a reference AMP database for functional, structural and taxonomic classification using [MMseqs2](https://github.com/soedinglab/MMseqs2). It further assigns the physiochemical properties (e.g. hydrophobicity, molecular weight) using the [Biopython toolkit](https://github.com/biopython/biopython) and clusters the resulting AMP hits from all samples using [MMseqs2](https://github.com/soedinglab/MMseqs2). To further filter the recovered AMPs using the presence of signaling peptides, the output file `Ampcombi_summary_cluster.tsv` or `ampcombi_complete_summary_taxonomy.tsv.gz` can be used downstream as detailed [here](https://ampcombi.readthedocs.io/en/main/usage.html#signal-peptide). The final tables generated may also be visualized and explored using an interactive [user interface](https://ampcombi.readthedocs.io/en/main/visualization.html).
+[AMPcombi](https://github.com/Darcy220606/AMPcombi) summarizes the results of **antimicrobial peptide (AMP)** prediction tools (ampir, AMPlify, Macrel, and other non-nf-core supported tools) into a single table and aligns the hits against a reference AMP database for functional, structural and taxonomic classification using [MMseqs2](https://github.com/soedinglab/MMseqs2). 
+It further assigns the physiochemical properties (e.g. hydrophobicity, molecular weight) using the [Biopython toolkit](https://github.com/biopython/biopython) and clusters the resulting AMP hits from all samples using [MMseqs2](https://github.com/soedinglab/MMseqs2).
+To further filter the recovered AMPs using the presence of signaling peptides, the output file `Ampcombi_summary_cluster.tsv` or `ampcombi_complete_summary_taxonomy.tsv.gz` can be used downstream as detailed [here](https://ampcombi.readthedocs.io/en/main/usage.html#signal-peptide).
+The final tables generated may also be visualized and explored using an interactive [user interface](https://ampcombi.readthedocs.io/en/main/visualization.html).
 
 <img src="https://raw.githubusercontent.com/Darcy220606/AMPcombi/main/docs/ampcombi_interface_screenshot2.png" alt="AMPcombi interface" width="650" height="300">
 
@@ -675,7 +681,7 @@ argNorm takes the outputs of the [hAMRonization](#hamronization) tool of [ABRica
 
 [MultiQC](http://multiqc.info) is used in nf-core/funcscan to report the versions of all software used in the given pipeline run, and provides a suggested methods text. This allows for reproducible analysis and transparency in method reporting in publications.
 
-Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see <http://multiqc.info>.
+Results generated by MultiQC collate pipeline QC from supported tools. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see <http://multiqc.info>.
 
 ### Pipeline information
 

diff --git a/docs/usage.md b/docs/usage.md
@@ -134,11 +134,12 @@ MMseqs2 is currently the only taxonomic classification tool used in the pipeline
 
 ### InterProScan
 
-[InterProScan](https://github.com/ebi-pf-team/interproscan) is currently the only protein annotation tool that gives a snapshot of the protein families and domains for each coding region.
+[InterProScan](https://github.com/ebi-pf-team/interproscan) is currently the only protein annotation tool in this pipeline that gives a snapshot of the protein families and domains for each coding region.
 
-The protein annotation workflow is activated with the flag `--run_protein_annotation`. InterProScan is used as the only protein annotation tool at the moment and the [InterPro database](http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.72-103.0) version 5.72-103.0 is downloaded and prepared to screen the input sequences against it.
+The protein annotation workflow is activated with the flag `--run_protein_annotation`.
+InterProScan is used as the only protein annotation tool at the moment and the [InterPro database](http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.72-103.0) version 5.72-103.0 is downloaded and prepared to screen the input sequences against it.
 
-Since the database download is huge (5.5GB) and might take quite some time, you can skip the automatic database download on each run by manually downloading and extracting the files of any [InterPro version](http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/) beforehand and providing the resulting directory path to `--protein_annotation_interproscan_db <path/to/interprodatabase>`.
+Since the database download is huge (5.5GB) and might take quite some time, you can skip the automatic database download (see section [Databases and reference files](usage/#interproscan-1) for details).
 
 :::info
 By default, the databases used by InterProScan is set as `PANTHER,ProSiteProfiles,ProSitePatterns,Pfam`. An addition of other application to the list does not guarantee that the results will be integrated correctly within `AMPcombi`.
@@ -245,7 +246,7 @@ This can then be passed to the pipeline with:
 
 The contents of the directory should have files such as `*.fasta` and `*.tsv` in the top level; a fasta file and the corresponding table with structural, functional and (if reported) taxonomic classifications. AMPcombi will then generate the corresponding `mmseqs2` directory, in which all binary files are prepared for downstream alignment of the recovered AMPs with [MMseqs2](https://github.com/soedinglab/MMseqs2). These can also be provided by the user by setting up an MMseqs2-compatible database using `mmseqs createdb *.fasta` in a directory called `mmseqs2`. An example file structure for [DRAMP](http://dramp.cpu-bioinfor.org/) used as the reference database:
 
-```bash
+```tree
 amp_DRAMP_database/
 ├── general_amps_2024_11_13.fasta
 ├── general_amps_2024_11_13.txt
@@ -260,7 +261,7 @@ amp_DRAMP_database/
     └── ref_DB.source
 ```
 
-:::note{.fa-whale}
+:::note
 For both [DRAMP](http://dramp.cpu-bioinfor.org/) and [APD](https://aps.unmc.edu/), AMPcombi removes entries that contain any non-amino acid residues by default.
 :::
 
@@ -511,7 +512,7 @@ deepbgc download
 You can then indicate the path to the database folder in the pipeline with `--bgc_deepbgc_db <path>/<to>/<deepbgc_db>/`.
 The contents of the database directory should include directories such as `common`, `0.1.0` in the top level:
 
-```console
+```tree
 deepbgc_db/
 ├── common
 └── <version-num>[0.1.0]
@@ -523,7 +524,11 @@ deepbgc_db/
 
 ### InterProScan
 
-[InterProScan](https://github.com/ebi-pf-team/interproscan) is used to provide more information about the proteins annotated on the contigs. By default, turning on this subworkflow with `--run_protein_annotation` will download and unzip the [InterPro database](http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.72-103.0/) version 5.72-103.0. The database can be saved in the output directory `<output_directory>/databases/interproscan/` if the `--save_db` is turned on. Note: the huge database download (5.5GB) can take up to 4 hours depending on the bandwidth.
+[InterProScan](https://github.com/ebi-pf-team/interproscan) is used to provide more information about the proteins annotated on the contigs. By default, turning on this subworkflow with `--run_protein_annotation` will download and unzip the [InterPro database](http://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.72-103.0/) version 5.72-103.0. The database can be saved in the output directory `<output_directory>/databases/interproscan/` if the `--save_db` is turned on.
+
+:::note
+The huge database download (5.5GB) can take up to 4 hours depending on the bandwidth.
+:::
 
 A local version of the database can be supplied to the pipeline by passing the InterProScan database directory to `--protein_annotation_interproscan_db <path/to/downloaded-untarred-interproscan_db-dir/>`. The directory can be created by running (e.g. for database version 5.72-103.0):
 

diff --git a/nextflow.config b/nextflow.config
@@ -526,7 +526,7 @@ manifest {
     description     = """Pipeline for screening for functional components of assembled contigs"""
     mainScript      = 'main.nf'
     defaultBranch   = 'master'
-    nextflowVersion = '!>=24.04.2'
+    nextflowVersion = '!>=24.10.2'
     version         = '2.1.0'
     doi             = '10.5281/zenodo.7643099'
 }