-
Notifications
You must be signed in to change notification settings - Fork 4
04. Generating Required Inputs for lsaBGC
lsaBGC-Process is the first program to run in the lsaBGC suite and simply creates the required inputs for the rest of suite. It's implementation is also different in that it requires users to specify paths to separate conda environments for the three programs which generate these required inputs: (i) Prokka (2) antiSMASH and (3) OrthoFinderV2. It is actually a workflow, similar to lsaBGC-Automate.py, and both programs can be found in the workflows/
subdirectory of the suite.
All three programs take a while to run, and it is therefore recommended that users only process completed / high-quality genomic assemblies through lsaBGC-Process to layout and identify the major BGCs found in two or more members of lineages. Additional instances of BGCs belonging to a GCF of interest can later be identified in high-throughput using lsaBGC-Expansion.py across a multitude of draft genomes, if desired. To run lsaBGC-Expansion.py however you will need to run the additional (low/medium quality) draft genomes through lsaBGC-Process.py in a special mode [ specified by setting the flags -p
(run only Prokka) and -q
(avoid deep annotation with Prokka) ] which avoids running AntiSMASH and OrthoFinder for each genomic assembly.
A hopefully convenient option for certain users with access to high-performance computing resources is the dry-run option. Which simply creates task files with commands for each of the three major programs and leaves it to the user to parallelize or initiate these on the server.
usage: lsaBGC-Process.py [-h] -a ASSEMBLY_LISTING -o OUTPUT_DIRECTORY -cp CONDA_PATH -pe PROKKA_ENV_PATH [-oe ORTHOFINDER_ENV_PATH] [-ae ANTISMASH_ENV_PATH] [-g GENUS] [-c CORES] [-d] [-q] [-p]
Program: lsaBGC-Process.py
Author: Rauf Salamzade
Affiliation: Kalan Lab, UW Madison, Department of Microbiology and Immunology
This program will automatically run or create task files for running Prokka (gene calling and annotation),
antiSMASH (biosynthetic gene cluster annotation), and OrthoFinder (de novo ortholog group construction).
optional arguments:
-h, --help show this help message and exit
-a ASSEMBLY_LISTING, --assembly_listing ASSEMBLY_LISTING
Tab delimited text file. First column is the sample name and the second is the path to its assembly in FASTA format. Please remove troublesome characters in the sample name.
-o OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY
Prefix for output files.
-cp CONDA_PATH, --conda_path CONDA_PATH
Path to anaconda/miniconda installation directory itself.
-pe PROKKA_ENV_PATH, --prokka_env_path PROKKA_ENV_PATH
Path to conda environment for Prokka.
-oe ORTHOFINDER_ENV_PATH, --orthofinder_env_path ORTHOFINDER_ENV_PATH
Path to conda environment for OrthoFinder. Optional, if not used, locus tags will be 3 characters insteado just 2.
-ae ANTISMASH_ENV_PATH, --antiSMASH_env_path ANTISMASH_ENV_PATH
Path to conda environment for antiSMASH. Database should automatically configured for antiSMASH loaded by the environment.
-g GENUS, --genus GENUS
The genus under investigation. The lineage of interest could be species, but for this, just use the genus.
-c CORES, --cores CORES
The number of cores to use.
-d, --dry_run Just create task files with commands for running prodigal, antiSMASH, and OrthoFinder. Useful for parallelizing across an HPC.
-q, --fast_annotation
Skip basic/standard annotation in Prokka.
-p, --only_run_prokka
Only run Prokka for gene annotation and Genbank creation. Skip the rest.