Skip to content


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation

Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge

beav - a bacterial genome and mobile element annotation pipeline

beav: Bacteria/Element Annotation reVamped

beav is a command line tool that streamlines bacterial genome and mobile genetic element annotation. It combines multiple annotation tools, automating the process of running, parsing, and combining the results into a single easy-to-read output. Annotated features include secretion systems, anti-phage defense systems, integrative & conjugative/mobilizable elements, integrons, prophage regions, amino acid biosynthesis pathways, small carbon metabolite catabolism pathways, and biosynthetic gene clusters. Type VI secretion system (T6SS) vgrG operons are automatically identified. Plasmid origin of transfer (oriT) elements are also characterized.

The beav pipeline also includes several tools and databases that enhance the annotation of plant associated microbes, including phytopathogens and symbionts. Custom bakta databases provide correct gene names and annotations for phytopathogen virulence genes, effectors, and genes important for mutualist symbiosis. Other tools annotate promoter elements such as the pip box, tts box, nod box, tra box, vir box, etc.

An optional Agrobacterium-specific pipeline identifies the presence of Ti and Ri plasmids and classifies them under the Weisberg et al. 2020 scheme. It also annotates Ti/Ri plasmid elements including T-DNA borders, overdrive, virbox, trabox, and other binding sites, and determines the biovar and genomospecies of the input strain. Virulence and T-DNA genes, including opine synthase and transport/catabolism loci, are also correctly named and annotated.

beav will generate Circos plot annotating important features for the genome as well as pTi/pRi plasmid (if Agrobacterium specific analysis is conducted). It is also possible to separately run the Circos script.

C58 Genome Circos

Example Circos plot of whole genome annotations automatically generated by beav.

C58 pTi Plasmid Circos

Example Circos plot visualizing oncogenic Ti/Ri plasmids generated by the optional Agrobacterium-specific pipeline.

Quick Start

#download and install beav with conda/mamba
mamba create -n beav beav
conda activate beav
#download all prerequisite databases
#run beav
beav --input /path/to/file/test.fna --threads 8 --skip_tiger


The beav pipeline requires a number of programs and databases be installed. Therefore, it is highly encouraged and recommended to use conda to install beav and all of its dependencies.

Once the tool is installed, run the beav_db tool to download all necessary databases.

From conda (Recommended)

It is recommended to use either conda with libmamba or mamba to install beav as this will greatly speed up the time solving the environment.

instructions for conda:

conda create -n beav
conda install -n beav beav

alternative instructions using mamba:

conda create -n beav
mamba install -n beav beav

or as one combined command:

conda create -n beav beav
mamba create -n beav beav

The conda environment can then be activated using:

conda activate beav

Alternative: From pixi

Pixi is a new tool for installing and managing conda packages and simplifies many things. To install beav globally, and with no need to activate environments, run the following command:

pixi global install beav

Then you can then follow the database installation step without running conda activate.

Alternative: From source

Clone the beav github repository.

git clone

If installing from source, DBSCAN-SWA, TIGER2, and GapMind (PaperBLAST) need to be installed in the software folder within the beav folder. Then the BEAV_DIR environment variable needs to be set and pointing to the beav directory.


Program Install location
Bakta PATH
IntegronFinder PATH
MacSyFinder PATH
DefenseFinder PATH
TIGER2 $BEAV_DIR/software
GapMind (PaperBlast) $BEAV_DIR/software

Databases for each of these programs can then be installed manually. Alternatively, the following can be used to install them automatically.

Install all databases

conda activate beav 

Database script optional parameters

usage: beav_db [--skip_bakta_db] [--light] [--bakta_db_path DIRECTORY] [--update]
        Skip downloading the Bakta databases
        Install the light version of Bakta databases
    --bakta_db_path DIRECTORY
        Install Bakta databases in nondefault location 
        Update Bakta databases


NOTE: If you get an error stating "ModuleNotFoundError: No module named 'nrpys'", then you can run the following command (with the beav conda environment activated) to force reinstall it:

python -m pip install --upgrade --force-reinstall nrpys

NOTE: there is currently a bug in the latest DefenseFinder models that cause an error in MacSyFinder when running it. We recommend running Beav with --skip_defensefinder until the MacSyFinder bug fix is released in bioconda. Alternatively, copying the patched file to the MacSyFinder python library folder of your conda release will fix the issue.

Patching instructions To do so, find the python version of your conda environment:
python --version

Then download the patched file:


Then copy it to the correct folder in your conda env, changing the python version as necessary:

cp $CONDA_PREFIX/lib/python3.9/site-packages/macsypy/
usage: beav [--input INPUT] [--output OUPUT_DIRECTORY] [--strain STRAIN] [--bakta_arguments BAKTA_ARGUMENTS] [--tiger_arguments TIGER_ARGUMENTS][--agrobacterium AGROBACTERIUM] [--skip_macsyfinder] [--skip_integronfinder][--skip_defensefinder] [--skip_tiger] [--skip_gapmind][--skip_dcscan-swa] [--skip_antismash] [--help] [--threads THREADS] [--genbank] [--continue]
    BEAV- Bacterial Element Annotation reVamped
        --input, -i STRAIN.fna
                Input file in fasta nucleotide format (Required)
        --output DIRECTORY
                Output directory (default: current working directory)
        --strain STRAIN
                Strain name (default: input file prefix)
        --bakta_arguments ARGUMENTS
                Additional arguments and database options specific to Bakta 
        --antismash_arguments ARGUMENTS
                Additional arguments and database options specific to antiSMASH (Default: \"$antismash_args\") 
        --tiger_blast_database DBPATH
                Path to a reference genome blast database for TIGER2 ICE analysis (Required unless --skip_tiger is used)
        --run_operon_email EMAIL
                Annotate predicted operons using the Operon-mapper webserver. Must input an email address for the job
                Agrobacterium specific tools that identify biovar/species group, Ti/Ri plasmid, T-DNA borders, virboxes and traboxes
                Skip detection and annotation of secretion systems
                Skip detection and annotation of integrons 
                Skip detection and annotation of anti-phage defense systems 
                Skip detection and annotation of integrative conjugative elements (ICEs)
                Skip detection of amino acid biosynthesis and carbon metabolism pathways
                Skip detection and annotation of prophage
                Skip detection and annotation of biosynthetic gene clusters
                Continue running BEAV from any point in the pipeline. Rerun programs that gave an error or were skipped.
                Use a GenBank file as input
        --help, -h
                Show BEAV help message
        --threads, -t
                Number of CPU threads



Additional antiSMASH arguments can be input into antiSMASH using the --antismash_arguments option. This allows for full usage of antiSMASH and additional databases.


Required if running TIGER. Users must provide a path to a blast database of reference genomes using the --tiger_blast_database option.


Additional arguments can be passed to bakta using the --bakta_arguments option.


The --agrobacterium option activates an additional pipeline to provide agrobacterium-specific annotation.


The skip options allow for specified programs to be skipped if the annotation is not needed or required programs are not installed.


The continue option will check the output of existing Beav runs and rerun programs that errored or were skipped. This option allows for the pipeline to be used with existing Bakta runs.


A GenBank file can be used as the input file when the genbank option is used.


Minimal run

beav --input /path/to/file/test.fna --threads 8 --skip_tiger

Standard run

beav --input /path/to/file/test.fna --threads 8 --tiger_blast_database /path/to/databases/blast/refseq_genomic.fna

Standard run with operon annotation (remote)

beav --input /path/to/file/test.fna --threads 8 --tiger_blast_database /path/to/databases/blast/refseq_genomic.fna --run_operon_email [email protected]

Standard run with genbank input

beav --input /path/to/file/test.gbk --threads 8 --tiger_blast_database /path/to/databases/blast/refseq_genomic.fna --gbk

Complex run

beav --input /path/to/file/test.fna --threads 8 --bakta_arguments '--db /path/to/alternative-data-bases/bakta-1.7/' --tiger_blast_database /path/to/databases/blast/allagro.fna --agrobacterium --skip_integronfinder

Standalone Circos plot generation

To generate Circos plots on your GenBank file independant of the beav pipeline, make sure the beav conda environment is activated:

conda activate beav 


beav_circos -i <GenBank_file> [-c <Contig_for_subset_visualization>] [--pTi <Contig_for_oncogenic_visualization>]


# Generate a general Circos plot for all contigs
beav_circos -i test.gbk

# Generate a general Circos plot for all contigs and a oncogenic Circos plot for single contig
beav_circos -i test.gbk --pTi contig_1

# Generate a general Circos plot for all contigs and a oncogenic Circos plot for a set of contigs
beav_circos -i test.gbk --pTi "contig_1 contig_2"

# Generate a general Circos plot for single contig
beav_circos -i test.gbk -c contig_1

# Generate a general Circos plot for a set of contigs
beav_circos -i test.gbk -c "contig_1 contig_2"


Beav can be cited as:

Jung, J. M., Rahman, A., Schiffer, A. M., & Weisberg, A. J. (2024). Beav: a bacterial genome and mobile element annotation pipeline. Msphere, 9(8), e00209-24.