Skip to content

Commit

Permalink
Merge pull request #268 from nf-core/dev
Browse files Browse the repository at this point in the history
Release - v1.1.0 - British Beans on Toast
  • Loading branch information
jasmezz authored Apr 27, 2023
2 parents 98a0815 + af57abe commit 1c1c9ae
Show file tree
Hide file tree
Showing 55 changed files with 2,450 additions and 480 deletions.
53 changes: 27 additions & 26 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
parameters:
- "--annotation_tool prodigal"
- "--annotation_tool prokka"
## Warning: we can't test Bakta as uses more memory than available on GHA CIs
- "--annotation_tool bakta --annotation_bakta_db_downloadtype light"

steps:
- name: Check out pipeline code
Expand Down Expand Up @@ -57,6 +57,7 @@ jobs:
parameters:
- "--annotation_tool prodigal"
- "--annotation_tool prokka"
- "--annotation_tool bakta --annotation_bakta_db_downloadtype light"

steps:
- name: Check out pipeline code
Expand All @@ -71,31 +72,31 @@ jobs:
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_bgc,docker --outdir ./results ${{ matrix.parameters }}
## DEACTIVATE CURRENTLY DUE TO EXTENDED DATABASE SERVER FAILURE
## CAN REACTIVATE ONCE WORKING AGAIN
# test_deeparg:
# name: Run pipeline with test data (DeepARG only workflow)
# # Only run on push if this is the nf-core dev branch (merged PRs)
# if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/funcscan') }}"
# runs-on: ubuntu-latest
# strategy:
# matrix:
# NXF_VER:
# - "22.10.1"
# - "latest-everything"
# parameters:
# - "--annotation_tool prodigal"
# - "--annotation_tool prokka"
test_deeparg:
name: Run pipeline with test data (DeepARG only workflow)
# Only run on push if this is the nf-core dev branch (merged PRs)
if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/funcscan') }}"
runs-on: ubuntu-latest
strategy:
matrix:
NXF_VER:
- "22.10.1"
- "latest-everything"
parameters:
- "--annotation_tool bakta --annotation_bakta_db_downloadtype light"
- "--annotation_tool prodigal"
- "--annotation_tool prokka"
- "--annotation_tool pyrodigal"

# steps:
# - name: Check out pipeline code
# uses: actions/checkout@v2
steps:
- name: Check out pipeline code
uses: actions/checkout@v2

# - name: Install Nextflow
# uses: nf-core/setup-nextflow@v1
# with:
# version: "${{ matrix.NXF_VER }}"
- name: Install Nextflow
uses: nf-core/setup-nextflow@v1
with:
version: "${{ matrix.NXF_VER }}"

# - name: Run pipeline with test data (DeepARG workflow)
# run: |
# nextflow run ${GITHUB_WORKSPACE} -profile test_deeparg,docker --outdir ./results ${{ matrix.parameters }}
- name: Run pipeline with test data (DeepARG workflow)
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_deeparg,docker --outdir ./results ${{ matrix.parameters }}
33 changes: 33 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,39 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v1.1.0 - British Beans on Toast - [2023-04-26]

### `Added`

- [#238](https://github.com/nf-core/funcscan/pull/238) Added dedicated DRAMP database downloading step for AMPcombi to prevent parallel downloads when no database provided by user. (by @jfy133)
- [#235](https://github.com/nf-core/funcscan/pull/235) Added parameter `annotation_bakta_db_downloadtype` to be able to switch between downloading either full (33.1 GB) or light (1.3 GB excluding UPS, IPS, PSC, see parameter description) versions of the Bakta database. (by @jasmezz)
- [#249](https://github.com/nf-core/funcscan/pull/249) Added bakta annotation to CI tests. (by @jasmezz)
- [#251](https://github.com/nf-core/funcscan/pull/251) Added annotation tool: Pyrodigal. (by @jasmezz)
- [#252](https://github.com/nf-core/funcscan/pull/252) Added a new parameter `-arg_rgi_savejson` that saves the file `<samplename>.json` in the RGI directory. The default ouput for RGI is now only `<samplename>.txt`. (by @darcy220606)
- [#253](https://github.com/nf-core/funcscan/pull/253) Updated Prodigal to have compressed output files. (by @jasmezz)
- [#262](https://github.com/nf-core/funcscan/pull/262) Added comBGC function to screen whole directory of antiSMASH output (one subfolder per sample). (by @jasmezz)
- [#263](https://github.com/nf-core/funcscan/pull/263) Removed `AMPlify` from test_full.config. (by @jasmezz)
- [#266](https://github.com/nf-core/funcscan/pull/266) Updated README.md with Pyrodigal. (by @jasmezz)

### `Fixed`

- [#243](https://github.com/nf-core/funcscan/pull/243) Compress the ampcombi_complete_summary.csv in the output directory. (by @louperelo)
- [#237](https://github.com/nf-core/funcscan/pull/237) Reactivate DeepARG automatic database downloading and CI tests as server is now back up. (by @jfy133)
- [#235](https://github.com/nf-core/funcscan/pull/235) Improved annotation speed by switching off pipeline-irrelevant Bakta annotation steps by default. (by @jasmezz)
- [#235](https://github.com/nf-core/funcscan/pull/235) Renamed parameter `annotation_bakta_db` to `annotation_bakta_db_localpath`. (by @jasmezz)
- [#242](https://github.com/nf-core/funcscan/pull/242) Fixed MACREL '.faa' issue that was generated when it was run on its own and upgraded MACREL from version `1.1.0` to `1.2.0` (by @Darcy220606)
- [#248](https://github.com/nf-core/funcscan/pull/248) Applied best-practice `error("message")` to all (sub)workflow files. (by @jasmezz)
- [#254](https://github.com/nf-core/funcscan/pull/254) Further resource optimisation based on feedback from 'real world' datasets. (ongoing, reported by @alexhbnr and @Darcy220606, fix by @jfy133)
- [#266](https://github.com/nf-core/funcscan/pull/266) Fixed wrong process name in base.config. (reported by @Darcy220606, fix by @jasmezz)

### `Dependencies`

| Tool | Previous version | New version |
| ----- | ---------------- | ----------- |
| Bakta | 1.6.1 | 1.7.0 |

### `Deprecated`

## v1.0.1 - [2023-02-27]

### `Added`
Expand Down
12 changes: 8 additions & 4 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

- [ABRicate](https://github.com/tseemann/abricate)

> Seemann T. (2020). ABRicate. Github [https://github.com/tseemann/abricate](https://github.com/tseemann/abricate).
> Seemann, T. (2020). ABRicate. Github [https://github.com/tseemann/abricate](https://github.com/tseemann/abricate).
- [AMPir](https://doi.org/10.1093/bioinformatics/btaa653)

Expand Down Expand Up @@ -48,15 +48,15 @@
- [GECCO](https://gecco.embl.de)

> Carroll, L.M. , Larralde, M., Fleck, J. S., Ponnudurai, R., Milanese, A., Cappio Barazzone, E. & Zeller, G. (2021). Accurate de novo identification of biosynthetic gene clusters with GECCO. bioRxiv [DOI: 10.1101/2021.05.03.442509](https://doi.org/10.1101/2021.05.03.442509)
> Carroll, L. M. , Larralde, M., Fleck, J. S., Ponnudurai, R., Milanese, A., Cappio Barazzone, E. & Zeller, G. (2021). Accurate de novo identification of biosynthetic gene clusters with GECCO. bioRxiv [DOI: 10.1101/2021.05.03.442509](https://doi.org/10.1101/2021.05.03.442509)
- [hAMRonization](https://github.com/pha4ge/hAMRonization)

> Public Health Alliance for Genomic Epidemiology (pha4ge). (2022). Parse multiple Antimicrobial Resistance Analysis Reports into a common data structure. Github. Retrieved October 5, 2022, from [https://github.com/pha4ge/hAMRonization](https://github.com/pha4ge/hAMRonization)
- [AMPcombi](https://github.com/Darcy220606/AMPcombi)

> Anan Ibrahim, & Louisa Perelo. (2023). Darcy220606/AMPcombi. [DOI: 10.5281/zenodo.7639121](https://doi.org/10.5281/zenodo.7639121).
> Ibrahim, A. & Perelo, L. (2023). Darcy220606/AMPcombi. [DOI: 10.5281/zenodo.7639121](https://doi.org/10.5281/zenodo.7639121).
- [HMMER](https://doi.org/10.1371/journal.pcbi.1002195.)

Expand All @@ -72,7 +72,11 @@
- [PROKKA](https://doi.org/10.1093/bioinformatics/btu153)

> Seemann T. (2014). Prokka: rapid prokaryotic genome annotation. Bioinformatics (Oxford, England), 30(14), 2068–2069. [DOI: 10.1093/bioinformatics/btu153](https://doi.org/10.1093/bioinformatics/btu153)
> Seemann, T. (2014). Prokka: rapid prokaryotic genome annotation. Bioinformatics (Oxford, England), 30(14), 2068–2069. [DOI: 10.1093/bioinformatics/btu153](https://doi.org/10.1093/bioinformatics/btu153)
- [Pyrodigal](https://doi.org/10.1186/1471-2105-11-119)

> Larralde, M. (2022). Pyrodigal: Python bindings and interface to Prodigal, an efficient method for gene prediction in prokaryotes. Journal of Open Source Software, 7(72), 4296. [DOI: 10.21105/joss.04296](https://doi.org/10.21105/joss.04296)
- [RGI](https://doi.org/10.1093/nar/gkz935)

Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) Jasmin Frangenberg, Anan Ibrahim, James A. Fellows Yates
Copyright (c) Jasmin Frangenberg, Anan Ibrahim, Louisa Perelo, Moritz E. Beber, James A. Fellows Yates

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The nf-core/funcscan AWS full test dataset are contigs generated by the MGnify s

## Pipeline summary

1. Annotation of assembled prokaryotic contigs with [`Prodigal`](https://github.com/hyattpd/Prodigal), [`Prokka`](https://github.com/tseemann/prokka), or [`Bakta`](https://github.com/oschwengers/bakta)
1. Annotation of assembled prokaryotic contigs with [`Prodigal`](https://github.com/hyattpd/Prodigal), [`Pyrodigal`](https://github.com/althonos/pyrodigal), [`Prokka`](https://github.com/tseemann/prokka), or [`Bakta`](https://github.com/oschwengers/bakta)
2. Screening contigs for antimicrobial peptide-like sequences with [`ampir`](https://cran.r-project.org/web/packages/ampir/index.html), [`Macrel`](https://github.com/BigDataBiology/macrel), [`HMMER`](http://hmmer.org/), [`AMPlify`](https://github.com/bcgsc/AMPlify)
3. Screening contigs for antibiotic resistant gene-like sequences with [`ABRicate`](https://github.com/tseemann/abricate), [`AMRFinderPlus`](https://github.com/ncbi/amr), [`fARGene`](https://github.com/fannyhb/fargene), [`RGI`](https://card.mcmaster.ca/analyze/rgi), [`DeepARG`](https://bench.cs.vt.edu/deeparg)
4. Screening contigs for biosynthetic gene cluster-like sequences with [`antiSMASH`](https://antismash.secondarymetabolites.org), [`DeepBGC`](https://github.com/Merck/deepbgc), [`GECCO`](https://gecco.embl.de/), [`HMMER`](http://hmmer.org/)
Expand Down
78 changes: 78 additions & 0 deletions bin/ampcombi_download.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
#!/usr/bin/env python3

#########################################
# Authors: [Anan Ibrahim](https://github.com/brianjohnhaas), [Louisa Perelo](https://github.com/louperelo)
# File: amp_database.py
# Source: https://github.com/Darcy220606/AMPcombi/blob/main/ampcombi/amp_database.py
# Source+commit: https://github.com/Darcy220606/AMPcombi/commit/a75bc00c32ecf873a133b18cf01f172ad9cf0d2d/ampcombi/amp_database.py
# Download Date: 2023-03-08, commit: a75bc00c
# This source code is licensed under the MIT license
#########################################

# TITLE: Download the DRAMP database if input db empty AND and make database compatible for diamond

import pandas as pd
import requests
import os
from datetime import datetime
import subprocess
from Bio import SeqIO
import tempfile
import shutil


########################################
# FUNCTION: DOWNLOAD DRAMP DATABASE AND CLEAN IT
#########################################
def download_DRAMP(db):
##Download the (table) file and store it in a results directory
url = "http://dramp.cpu-bioinfor.org/downloads/download.php?filename=download_data/DRAMP3.0_new/general_amps.xlsx"
r = requests.get(url, allow_redirects=True)
with open(db + "/" + "general_amps.xlsx", "wb") as f:
f.write(r.content)
##Convert excel to tab sep file and write it to a file in the DRAMP_db directly with the date its downloaded
date = datetime.now().strftime("%Y_%m_%d")
ref_amps = pd.read_excel(db + "/" + r"general_amps.xlsx")
ref_amps.to_csv(db + "/" + f"general_amps_{date}.tsv", index=None, header=True, sep="\t")
##Download the (fasta) file and store it in a results directory
urlfasta = (
"http://dramp.cpu-bioinfor.org/downloads/download.php?filename=download_data/DRAMP3.0_new/general_amps.fasta"
)
z = requests.get(urlfasta)
fasta_path = os.path.join(db + "/" + f"general_amps_{date}.fasta")
with open(fasta_path, "wb") as f:
f.write(z.content)
##Cleaning step to remove ambigous aminoacids from sequences in the database (e.g. zeros and brackets)
new_fasta = db + "/" + f"general_amps_{date}_clean.fasta"
seq_record = SeqIO.parse(open(fasta_path), "fasta")
with open(new_fasta, "w") as f:
for record in seq_record:
id, sequence = record.id, str(record.seq)
letters = [
"A",
"C",
"D",
"E",
"F",
"G",
"H",
"I",
"K",
"L",
"M",
"N",
"P",
"Q",
"R",
"S",
"T",
"V",
"W",
"Y",
]
new = "".join(i for i in sequence if i in letters)
f.write(">" + id + "\n" + new + "\n")
return os.remove(fasta_path), os.remove(db + "/" + r"general_amps.xlsx")


download_DRAMP("amp_ref_database")
64 changes: 56 additions & 8 deletions bin/comBGC.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
SOFTWARE.
"""

tool_version = "0.5"
tool_version = "0.6.0"
welcome = """\
........................
* comBGC v.{version} *
Expand Down Expand Up @@ -61,7 +61,9 @@
these can be:
- antiSMASH: <sample name>.gbk and (optional) knownclusterblast/ directory
- DeepBGC: <sample name>.bgc.tsv
- GECCO: <sample name>.clusters.tsv""",
- GECCO: <sample name>.clusters.tsv
Note: Please provide files from a single sample only. If you would like to
summarize multiple samples, please see the --antismash_multiple_samples flag.""",
)
parser.add_argument(
"-o",
Expand All @@ -73,6 +75,16 @@
type=str,
default=".",
)
parser.add_argument(
"-a",
"--antismash_multiple_samples",
metavar="PATH",
dest="antismash_multiple_samples",
nargs="?",
help="""directory of antiSMASH output. Should contain subfolders (one per
sample). Can only be used if --input is not specified.""",
type=str,
)
parser.add_argument("-vv", "--verbose", help="increase output verbosity", action="store_true")
parser.add_argument("-v", "--version", help="show version number and exit", action="store_true")

Expand All @@ -81,6 +93,7 @@

# Assign input arguments to variables
input = args.input
dir_antismash = args.antismash_multiple_samples
outdir = args.outdir
verbose = args.verbose
version = args.version
Expand Down Expand Up @@ -111,15 +124,38 @@
elif path.endswith("knownclusterblast/"):
input_antismash.append(path)

if input and dir_antismash:
exit(
"The flags --input and --antismash_multiple_samples are mutually exclusive.\nPlease use only one of them (or see --help for how to use)."
)

# Make sure that at least one input argument is given
if not (input_antismash or input_gecco or input_deepbgc):
if not (input_antismash or input_gecco or input_deepbgc or dir_antismash):
exit("Please specify at least one input file (i.e. output from antismash, deepbgc, or gecco) or see --help")

########################
# ANTISMASH FUNCTIONS
########################


def prepare_multisample_input_antismash(antismash_dir):
"""
Prepare string of input paths of a given antiSMASH output folder (with sample subdirectories)
"""
sample_paths = []
for root, subdirs, files in os.walk(antismash_dir):
antismash_file = "/".join([root, "index.html"])
if os.path.exists(antismash_file):
sample = root.split("/")[-1]
gbk_path = "/".join([root, sample]) + ".gbk"
kkb_path = "/".join([root, "knownclusterblast"])
if os.path.exists(kkb_path):
sample_paths.append([gbk_path, kkb_path])
else:
sample_paths.append([gbk_path])
return sample_paths


def parse_knownclusterblast(kcb_file_path):
"""
Extract MIBiG IDs from knownclusterblast TXT file.
Expand Down Expand Up @@ -148,9 +184,6 @@ def antismash_workflow(antismash_paths):
- Return data frame with aggregated info.
"""

if verbose:
print("\nParsing antiSMASH files\n... ", end="")

antismash_sum_cols = [
"Sample_ID",
"Prediction_tool",
Expand Down Expand Up @@ -186,6 +219,9 @@ def antismash_workflow(antismash_paths):

# Aggregate information
Sample_ID = gbk_path.split("/")[-1].split(".gbk")[-2] # Assuming file name equals sample name
if verbose:
print("\nParsing antiSMASH file(s): " + Sample_ID + "\n... ", end="")

with open(gbk_path) as gbk:
for record in SeqIO.parse(gbk, "genbank"): # GBK records are contigs in this case
# Initiate variables per contig
Expand Down Expand Up @@ -514,7 +550,13 @@ def gecco_workflow(gecco_paths):
########################

if __name__ == "__main__":
tools = {"antiSMASH": input_antismash, "deepBGC": input_deepbgc, "GECCO": input_gecco}
if input_antismash:
tools = {"antiSMASH": input_antismash, "deepBGC": input_deepbgc, "GECCO": input_gecco}
elif dir_antismash:
tools = {"antiSMASH": dir_antismash}
else:
tools = {"deepBGC": input_deepbgc, "GECCO": input_gecco}

tools_provided = {}

for tool in tools.keys():
Expand All @@ -532,7 +574,13 @@ def gecco_workflow(gecco_paths):

for tool in tools_provided.keys():
if tool == "antiSMASH":
summary_antismash = antismash_workflow(input_antismash)
if dir_antismash:
antismash_paths = prepare_multisample_input_antismash(dir_antismash)
for input_antismash in antismash_paths:
summary_antismash_temp = antismash_workflow(input_antismash)
summary_antismash = pd.concat([summary_antismash, summary_antismash_temp])
else:
summary_antismash = antismash_workflow(input_antismash)
elif tool == "deepBGC":
summary_deepbgc = deepbgc_workflow(input_deepbgc)
elif tool == "GECCO":
Expand Down
Loading

0 comments on commit 1c1c9ae

Please sign in to comment.