beav: Bacteria/Element Annotation reVamped
beav is a command line tool that streamlines bacterial genome and mobile genetic element annotation. It combines multiple annotation tools, automating the process of running, parsing, and combining the results into a single easy-to-read output. Annotated features include secretion systems, anti-phage defense systems, integrative & conjugative/mobilizable elements, integrons, prophage regions, amino acid biosynthesis pathways, small carbon metabolite catabolism pathways, and biosynthetic gene clusters. Type VI secretion system (T6SS) vgrG operons are automatically identified. Plasmid origin of transfer (oriT) elements are also characterized.
The beav pipeline also includes several tools and databases that enhance the annotation of plant associated microbes, including phytopathogens and symbionts. Custom bakta databases provide correct gene names and annotations for phytopathogen virulence genes, effectors, and genes important for mutualist symbiosis. Other tools annotate promoter elements such as the pip box, tts box, nod box, tra box, vir box, etc.
An optional Agrobacterium-specific pipeline identifies the presence of Ti and Ri plasmids and classifies them under the Weisberg et al. 2020 scheme. It also annotates Ti/Ri plasmid elements including T-DNA borders, overdrive, virbox, trabox, and other binding sites, and determines the biovar and genomospecies of the input strain. Virulence and T-DNA genes, including opine synthase and transport/catabolism loci, are also correctly named and annotated.
beav will generate Circos plot annotating important features for the genome as well as pTi/pRi plasmid (if Agrobacterium specific analysis is conducted). It is also possible to separately run the Circos script.
Example Circos plot of whole genome annotations automatically generated by beav. Example Circos plot visualizing oncogenic Ti/Ri plasmids generated by the optional Agrobacterium-specific pipeline.#download and install beav with conda/mamba
mamba create -n beav beav
conda activate beav
#download all prerequisite databases
beav_db
#run beav
beav --input /path/to/file/test.fna --threads 8 --skip_tiger
The beav pipeline requires a number of programs and databases be installed. Therefore, it is highly encouraged and recommended to use conda to install beav and all of its dependencies.
Once the tool is installed, run the beav_db tool to download all necessary databases.
It is recommended to use either conda with libmamba or mamba to install beav as this will greatly speed up the time solving the environment.
instructions for conda:
conda create -n beav
conda install -n beav beav
alternative instructions using mamba:
conda create -n beav
mamba install -n beav beav
or as one combined command:
conda create -n beav beav
or
mamba create -n beav beav
The conda environment can then be activated using:
conda activate beav
Pixi is a new tool for installing and managing conda packages and simplifies many things. To install beav globally, and with no need to activate environments, run the following command:
pixi global install beav
Then you can then follow the database installation step without running conda activate.
Clone the beav github repository.
git clone https://github.com/weisberglab/beav.git
If installing from source, DBSCAN-SWA, TIGER2, and GapMind (PaperBLAST) need to be installed in the software folder within the beav folder. Then the BEAV_DIR environment variable needs to be set and pointing to the beav directory.
Prerequisites:
Program | Install location |
---|---|
Bakta | PATH |
IntegronFinder | PATH |
MacSyFinder | PATH |
DefenseFinder | PATH |
TIGER2 | $BEAV_DIR/software |
GapMind (PaperBlast) | $BEAV_DIR/software |
DBSCAN-SWA | $BEAV_DIR/software |
antiSMASH | PATH |
EMBOSS | PATH |
HMMER | PATH |
Databases for each of these programs can then be installed manually. Alternatively, the following can be used to install them automatically.
conda activate beav
beav_db
usage: beav_db [--skip_bakta_db] [--light] [--bakta_db_path DIRECTORY] [--update]
--skip_bakta_db
Skip downloading the Bakta databases
--light
Install the light version of Bakta databases
--bakta_db_path DIRECTORY
Install Bakta databases in nondefault location
--update
Update Bakta databases
NOTE: If you get an error stating "ModuleNotFoundError: No module named 'nrpys'", then you can run the following command (with the beav conda environment activated) to force reinstall it:
python -m pip install --upgrade --force-reinstall nrpys
NOTE: there is currently a bug in the latest DefenseFinder models that cause an error in MacSyFinder when running it. We recommend running Beav with --skip_defensefinder
until the MacSyFinder bug fix is released in bioconda. Alternatively, copying the patched file to the MacSyFinder python library folder of your conda release will fix the issue.
Patching instructions
To do so, find the python version of your conda environment:python --version
Then download the patched registries.py file:
wget https://github.com/gem-pasteur/macsyfinder/blob/27ee21ceb8e7100d9183b084356f791487aca4ad/macsypy/registries.py
Then copy it to the correct folder in your conda env, changing the python version as necessary:
cp registries.py $CONDA_PREFIX/lib/python3.9/site-packages/macsypy/
usage: beav [--input INPUT] [--output OUPUT_DIRECTORY] [--strain STRAIN] [--bakta_arguments BAKTA_ARGUMENTS] [--tiger_arguments TIGER_ARGUMENTS][--agrobacterium AGROBACTERIUM] [--skip_macsyfinder] [--skip_integronfinder][--skip_defensefinder] [--skip_tiger] [--skip_gapmind][--skip_dcscan-swa] [--skip_antismash] [--help] [--threads THREADS] [--genbank] [--continue]
BEAV- Bacterial Element Annotation reVamped
Input/Output:
--input, -i STRAIN.fna
Input file in fasta nucleotide format (Required)
--output DIRECTORY
Output directory (default: current working directory)
--strain STRAIN
Strain name (default: input file prefix)
--bakta_arguments ARGUMENTS
Additional arguments and database options specific to Bakta
--antismash_arguments ARGUMENTS
Additional arguments and database options specific to antiSMASH (Default: \"$antismash_args\")
--tiger_blast_database DBPATH
Path to a reference genome blast database for TIGER2 ICE analysis (Required unless --skip_tiger is used)
--run_operon_email EMAIL
Annotate predicted operons using the Operon-mapper webserver. Must input an email address for the job
Options:
--agrobacterium
Agrobacterium specific tools that identify biovar/species group, Ti/Ri plasmid, T-DNA borders, virboxes and traboxes
--skip_macsyfinder
Skip detection and annotation of secretion systems
--skip_integronfinder
Skip detection and annotation of integrons
--skip_defensefinder
Skip detection and annotation of anti-phage defense systems
--skip_tiger
Skip detection and annotation of integrative conjugative elements (ICEs)
--skip_gapmind
Skip detection of amino acid biosynthesis and carbon metabolism pathways
--skip_dbscan-swa
Skip detection and annotation of prophage
--skip_antismash
Skip detection and annotation of biosynthetic gene clusters
--continue
Continue running BEAV from any point in the pipeline. Rerun programs that gave an error or were skipped.
--gbk
Use a GenBank file as input
General:
--help, -h
Show BEAV help message
--threads, -t
Number of CPU threads
--antismash_arguments
Additional antiSMASH arguments can be input into antiSMASH using the --antismash_arguments option. This allows for full usage of antiSMASH and additional databases.
--tiger_blast_database
Required if running TIGER. Users must provide a path to a blast database of reference genomes using the --tiger_blast_database option.
--bakta_arguments
Additional arguments can be passed to bakta using the --bakta_arguments option.
--agrobacterium
The --agrobacterium option activates an additional pipeline to provide agrobacterium-specific annotation.
--skip-PROGRAM
The skip options allow for specified programs to be skipped if the annotation is not needed or required programs are not installed.
--continue
The continue option will check the output of existing Beav runs and rerun programs that errored or were skipped. This option allows for the pipeline to be used with existing Bakta runs.
--gbk
A GenBank file can be used as the input file when the genbank option is used.
Minimal run
beav --input /path/to/file/test.fna --threads 8 --skip_tiger
Standard run
beav --input /path/to/file/test.fna --threads 8 --tiger_blast_database /path/to/databases/blast/refseq_genomic.fna
Standard run with operon annotation (remote)
beav --input /path/to/file/test.fna --threads 8 --tiger_blast_database /path/to/databases/blast/refseq_genomic.fna --run_operon_email [email protected]
Standard run with genbank input
beav --input /path/to/file/test.gbk --threads 8 --tiger_blast_database /path/to/databases/blast/refseq_genomic.fna --gbk
Complex run
beav --input /path/to/file/test.fna --threads 8 --bakta_arguments '--db /path/to/alternative-data-bases/bakta-1.7/' --tiger_blast_database /path/to/databases/blast/allagro.fna --agrobacterium --skip_integronfinder
To generate Circos plots on your GenBank file independant of the beav pipeline, make sure the beav conda environment is activated:
conda activate beav
Usage:
beav_circos -i <GenBank_file> [-c <Contig_for_subset_visualization>] [--pTi <Contig_for_oncogenic_visualization>]
Examples:
# Generate a general Circos plot for all contigs
beav_circos -i test.gbk
# Generate a general Circos plot for all contigs and a oncogenic Circos plot for single contig
beav_circos -i test.gbk --pTi contig_1
# Generate a general Circos plot for all contigs and a oncogenic Circos plot for a set of contigs
beav_circos -i test.gbk --pTi "contig_1 contig_2"
# Generate a general Circos plot for single contig
beav_circos -i test.gbk -c contig_1
# Generate a general Circos plot for a set of contigs
beav_circos -i test.gbk -c "contig_1 contig_2"
Beav can be cited as:
Jung, J. M., Rahman, A., Schiffer, A. M., & Weisberg, A. J. (2024). Beav: a bacterial genome and mobile element annotation pipeline. Msphere, 9(8), e00209-24. https://doi.org/10.1128/msphere.00209-24