-
Notifications
You must be signed in to change notification settings - Fork 4
Home
lsaBGC consists of several individual programs which provide a broad suite of functions for comparative analysis of biosynthetic gene clusters across a single focal lineage or taxa (recommended/tested at species or genus levels), to understand the allelic variability observed for BGC genes, and mine for novel SNVs within such genes representative of previously unidentified allelic variants.
To learn more about the installation of lsaBGC and its dependencies, please take a look at the Installation wiki page.
What functionalities does lsaBGC offer to users? Learn more about the suite's intended usages and where it should not be used, along with recommendations to other great software for exploring and wrangling comparative analysis of secondary metabolite genetic architectures Background wiki page!
A detailed walkthrough for using the lsaBGC suite as intended can be found on the third Wiki page.
We found that the Corynebacterium kefirresidentii is a common species complex of the skin microbiome and harbor several BGCs across their compact genome. We use the publicly available genomes from the complex as a small and simple test set to demonstrate the exploratory power of lsaBGC. Please have a look at the lsaBGC_Ckefir_Testing_Cases Github repo for further details.
lsaBGC comprises of 8 primary programs:
Many of the main programs utilize an object oriented infrastructure for processing and analysis. More information on this infrastructure can be found on the wiki page OOP Framework.
Program | Description | Input | Output |
---|---|---|---|
lsaBGC-Ready.py | Takes existing antiSMASH results (and optionally BiG-SCAPE) and creates inputs necessary to run downstream lsaBGC analyses (reformats BGC genbanks, groups orthologs, finds genome-wide paralogs etc.). |
|
|
lsaBGC-Cluster.py | Takes the comprehensive list of BGCs and clusters them using MCL into GCFs |
|
|
lsaBGC-Refiner.py | Refines boundaries of BGCs belonging to a single GCF according to user specifications. |
|
|
lsaBGC-Expansion.py | Uses an HMM based approach to quickly find homologous instances of GCF in draft-quality genomes. |
|
|
lsaBGC-See.py | Visualizes BGC instances of a GCF across a phylogeny |
|
|
lsaBGC-Divergence.py | Determines 𝜷-RT statistic for assessing BGC divergence relative to genome-wide divergence between isolate pairs. |
|
|
lsaBGC-PopGene.py | Looks at sequence conservation and performs population genetic analyses for each homolog group found in GCF. |
|
|
lsaBGC-DiscoVary.py | Identifies GCF instances in metagenomes and looks for base-resolution novelty within genes from raw sequencing data not observed in genomic assemblies for the taxonomy. |
|
|
Also provided are three workflow/pipeline programs, lsaBGC-AutoProcess.py, lsaBGC-AutoExpansion.py, and lsaBGC-AutoAnalyze.py, which simplify the generation of inputs necessary for the lsaBGC framework and allow for the automatic processing of each GCF post-clustering through standard analysis:
Program | Description | Input | Output |
---|---|---|---|
lsaBGC-AutoProcess.py | Automatically runs Prokka, AntiSMASH, and OrthoFinder |
|
|
lsaBGC-AutoExpansion.py | Automatically runs lsaBGC-Expansion for all GCFs and resolves conflicts (e.g. overlapping BGCs for different GCFs) |
|
|
lsaBGC-AutoAnalyze.py | Automatically runs lsaBGC-See.py, lsaBGC-PopGene.py, lsaBGC-Divergence.py, and lsaBGC-DiscoVary for each GCF. |
|
|
Future to-do's involve getting these workflows re-written in a DSL framework such as NextFlow.
Several additional programs and scripts are included in the lsaBGC suite. Major scripts of potential interest are described here.