-
Notifications
You must be signed in to change notification settings - Fork 4
Home
lsaBGC consists of several individual programs which provide a broad suite of functions for comparative analysis of biosynthetic gene clusters across a single focal lineage or taxa (recommended/tested at species or genus levels), to understand the allelic variability observed for BGC genes, and mine for novel SNVs within such genes representative of previously unidentified allelic variants.
To learn more about the installation of lsaBGC and its dependencies, please take a look at the Installation wiki page.
What functionalities does lsaBGC offer to users? Learn more about the suite's intended usages and where it should not be used, along with recommendations to other great software for exploring and wrangling comparative analysis of secondary metabolite genetic architectures Background wiki page!
Micrococcus luteus is a common member of the skin microbiome and harbors several BGCs across its compact genome. We use the publicly available genomes of M. luteus as a small and simple test set to demonstrate the exploratory power of lsaBGC. Please have a look at the Tutorial wiki page for further details!
lsaBGC comprises of 7 primary programs:
Many of the main programs utilize an object oriented infrastructure for processing and analysis. More information on this infrastructure can be found on the wiki page OOP Framework.
Program | Description | Input | Output |
---|---|---|---|
lsaBGC-Cluster.py | Takes the comprehensive list of BGCs and clusters using MCL into GCFs |
|
|
lsaBGC-Refiner.py | Refines boundaries of BGCs belonging to a single GCF according to user specifications. |
|
|
lsaBGC-Expansion.py | Constructs HMMs for each homolog group observed in a GCF and finds additional instances in new genomes |
|
|
lsaBGC-See.py | For a single GCF, visualizes each BGC across a phylogeny (also, modifies phylogeny if multiple BGCs in GCF per sample) |
|
|
lsaBGC-Divergence.py | Determines 𝜷-RT statistic for assessing BGC divergence relative to genome-wide divergence between isolate pairs. |
|
|
lsaBGC-PopGene.py | Looks at sequence conservation and performs population genetic analyses for each homolog group found in GCF. |
|
|
lsaBGC-DiscoVary.py | Looks for base-resolution novelty of genes found in GCF from raw sequencing data directly, allowing for rapid detection without need for culturing. |
Also provided are three important workflow programs, lsaBGC-AutoProcess.py, lsaBGC-AutoExpansion.py, and lsaBGC-AutoAnalyze.py, which simplify the generation of inputs necessary for the lsaBGC framework and allow for the automatic processing of each GCF post-clustering through standard analyses:
Program | Description | Input | Output |
---|---|---|---|
lsaBGC-AutoProcess.py | Automatically runs Prokka, AntiSMASH, and OrthoFinder |
|
|
lsaBGC-AutoExpansion.py | Automatically runs Prokka, AntiSMASH, and OrthoFinder |
|
|
lsaBGC-AutoAnalyze.py | Automatically runs lsaBGC-See.py, lsaBGC-PopGene.py, lsaBGC-Divergence.py, and lsaBGC-DiscoVary for each GCF. |
|
|
Future to-do's involve getting these workflows re-written in a DSL framework such as NextFlow.