Skip to content

Command line help

Jorge edited this page Jan 8, 2022 · 1 revision

Command-line help

These are the current options when invoking the help command on the command-line (python BGCtoolkit.py -h)

usage: BGCtoolkit.py [-h] [-i INPUTFOLDERS [INPUTFOLDERS ...]] [-f FILES [FILES ...]] [--include [INCLUDE [INCLUDE ...]]] [--exclude [EXCLUDE [EXCLUDE ...]]] [-l BGCLIST] [--hmms [HMMS [HMMS ...]]] [--update-domains]
                     [--clear_domains] [-c CPUS] [--merge] [-o OUTPUTFOLDER] [--metadata METADATA] [--svg] [--svgcfg SVGCFG] [-m] [-s STACKED] [--comparative] [--gaps] [--cbt-file [CBT_FILE]]
                     [--cbt-include [CBT_INCLUDE [CBT_INCLUDE ...]]] [--cbt-exclude CBT_EXCLUDE [CBT_EXCLUDE ...]] [--bgc] [--bgccase BGCCASE] [-p PROTEINCASE] [--genbank] [--cbt-fasta]

BGC toolkit v0.1. Tools to facilitate biosynthetic gene cluster handling and visualization.

optional arguments:
  -h, --help            show this help message and exit

Input:
  -i INPUTFOLDERS [INPUTFOLDERS ...], --inputfolders INPUTFOLDERS [INPUTFOLDERS ...]
                        Folder(s) to search (recursively) for .gb and .gbk files. If the file starts with 'scaffold' or 'contig', the parent folder's name will be used in the internal BGC name for that file.
  -f FILES [FILES ...], --files FILES [FILES ...]
                        Input individual files (accepted: .gb .gbk, .bgc, .bgccase, .fasta, .proteincase).

Filtering:
  Include or exclude BGCs based on their names. Note: for .bgc and .bgccase files, inclusion rules by --include, --exclude and --bgclist will be applied to the internal BGC identifier, not to the name of the file.

  --include [INCLUDE [INCLUDE ...]]
                        Specify string(s) to filter which BGCs will be included. In the case of .gb or .gbk files, the filter is applied to the filename. For data stored as .bgc or .bgccase files, the filter is applied to the
                        BGC(s) identifier. If the option is present but no arguments are given, the filter will be ignored (all BGCs will be included). If the argument is not present, the default is to use the strings 'region'
                        and 'cluster').
  --exclude [EXCLUDE [EXCLUDE ...]]
                        Specify string(s) to filter which BGCs will be rejected. Similar rules are applied as with --include. If the argument is not present, the default is to use 'final'.
  -l BGCLIST, --bgclist BGCLIST
                        A tab-separated file containing a list of BGC names (first column) and protein names (second column). The BGC names filter input from --files and --inputfolder. If only the protein column is present, it
                        will be used to filter input from .proteincase or .fasta files. If used with --svg and --stacked, the contents of this file also define the order (and protein to align) in the final SVG figure. Any extra
                        columns or rows starting with the '#' character will be ignored.

Domain options:
  --hmms [HMMS [HMMS ...]]
                        List of paths to .hmm file(s). This will also enable internal hmm models (use without arguments to only use internal models).
  --update-domains      Use domain prediction on input, even if they were marked as already having domain prediction. If new and old hits overlap, only the best-scoring hit will be kept.
  --clear_domains       Clean all domain data from input before domain prediction.
  -c CPUS, --cpus CPUS  Number of CPUs used for domain prediction. Default: all available.

Post-processing options:
  --merge               Try to fix successive domains of the same type that have been split. This could be useful for phylogenetic analysis but will not fix incorrect gene predictions. Use with care.

Output:
  Basic output options

  -o OUTPUTFOLDER, --outputfolder OUTPUTFOLDER
                        Base folder where results will be put (default='./output').
  --metadata METADATA   Writes information files at three levels: a summary of the whole collection, a summary per BGC (CBP content) and a summary per core protein (domain organization). Argument is the basename of these files
                        (no extension). Activated by default if --bgccase is used.

SVG options:
  --svg                 Toggle to enable SVG output for each BGC.
  --svgcfg SVGCFG       Configuration file with SVG style. Default: 'SVG_arrow_options.cfg'
  -m, --mirror          Toggle to mirror each BGC figure. Ignored with --stacked or --bgclist
  -s STACKED, --stacked STACKED
                        If used with --svg, all BGC SVGs will be put in the same figure. The argument of this parameter is the filename (no extension).
  --comparative         If --stacked and --bgclist are used, calculate protein similarity between each pair of BGCs and represent it as connecting bands.
  --gaps                If --stacked is used, toggle this option to leave gaps when a particular BGC or protein is not found in the input data (--bgclist).

Organize biosynthetic output:
  (Optional) Use these parameters with the remaining output options to separate data into different sub-folders according to selected Core Biosynthetic Type(s) (CBTs). If any of these options are used, the final biosynthetic
  types used will be those common between a) the types in the dataset, b) the ones in the cbt file and c) the ones in the cbt-include list, minus those from the cbt-exclude option. Currently supported types from antiSMASH:
  'CDPS', 'LAP', 'NAGGN', 'PBDE', 'PKS-like', 'PUFA', 'PpyS-KS', 'RaS-RiPP', 'T1PKS', 'T2PKS', 'T3PKS', 'TfuA-related', 'acyl_amino_acids', 'amglyccycl', 'aminocoumarin', 'arylpolyene', 'bacteriocin', 'betalactone',
  'blactam', 'bottromycin', 'butyrolactone', 'cyanobactin', 'ectoine', 'fatty_acid', 'fungal-RiPP', 'furan', 'fused', 'glycocin', 'halogenated', 'head_to_tail', 'hglE-KS', 'hserlactone', 'indole', 'ladderane',
  'lanthipeptide', 'lassopeptide', 'linaridin', 'lipolanthine', 'melanin', 'microviridin', 'nrps', 'nrps-like', 'nucleoside', 'oligosaccharide', 'other', 'phenazine', 'phosphoglycolipid', 'phosphonate', 'proteusin',
  'resorcinol', 'saccharide', 'sactipeptide', 'siderophore', 'terpene', 'thioamide-NRP', 'thiopeptide', 'transAT-PKS', 'transAT-PKS-like', 'tropodithietic-acid'. Currently supported fungal types (these will need domain
  prediction): 'Carotenoid_synthase', 'DMATS', 'Diterpene_synthase', 'Meroterpenoid_synthase', 'NIS', 'NRPS', 'NRPS-PKS_hybrid', 'NRPS-like', 'PKS-NRPS_hybrid', 'PKS-mmNRPS_hybrid', 'Sesquiterpene_bifunctional_synthase',
  'Sesquiterpene_synthase', 'Squalene_synthase', 'Terpene_other', 'Triterpene_synthase', 'UbiA-type_terpene', 'nrPKS', 'other_PKS', 'rPKS', 't3PKS', 'unknown_PKS'.

  --cbt-file [CBT_FILE]
                        A file with valid core biosynthetic types. If no argument is given, the file 'CBP_output_types.cfg' will be read.
  --cbt-include [CBT_INCLUDE [CBT_INCLUDE ...]]
                        A space-separated list of core biosynthetic types to use when organizing output. If 'all' or not argument is given, all posible CBTs will be used. Additionally for this parameter, it is possible to
                        search for specific domains (e.g. 'all:C' or 'nrPKS:SAT')
  --cbt-exclude CBT_EXCLUDE [CBT_EXCLUDE ...]
                        A space-separated list of all the CBTs to exclude.
  --cbt-fasta           Toggle to output sequences of (specialized metabolite's) Core Biosynthetic Proteins for each CBT defined in the 'CBP_output_types.cfg' file. Only available when cbt* options are enabled.

File output:
  These options will trigger metadata output

  --bgc                 Toggle to save binary (.bgc) files of each BGC (from .bgc, .bgccase and GenBank files). Data from fasta input files will not be included
  --bgccase BGCCASE     Output a single binary file with all BGCs (.bgccase). The argument of this parameter is the filename (no extension). Data from fasta input files will not be included.
  -p PROTEINCASE, --proteincase PROTEINCASE
                        Output .proteincase files for each type of Core Biosynthetic Protein type specified in the file 'CBP_output_types.cfg'. The argument of this parameter is the basename for all .proteincase files
  --genbank             Toggle to create GenBank files. Currently, only works for gbk input files. Use with 'cbt' options to separate by biosynthetic type.
Clone this wiki locally