-
Notifications
You must be signed in to change notification settings - Fork 1
Home
ARBitR: Assembly Refinement with Barcode-identity-tagged Reads
This page described ARBitR parameters in detail.
usage: arbitr.py [-h] [-v] [-i INPUT_FASTA] [-s REGION_SIZE] [-m MOLECULE_SIZE]
[-F BARCODE_FRACTION] [-f BARCODE_FACTOR] [-q MAPQ]
[-Q SHORT_MAPQ] [-B SHORT_BC_QUANT] [-c COVERAGE]
[-b BC_QUANTITY] [-o OUTPUT]
input_bam
Reads a bam file, creates links between contigs based on linked read
information, and outputs a .gfa.
positional arguments:
input_bam Input bam file. Required.
optional arguments:
-h, --help show this help message and exit
-v, --version Print version and exit.
-i INPUT_FASTA, --input_fasta INPUT_FASTA
Input fasta file for contig merging. Optional. If not
specified, will only output linkage graph in .gfa and
.tsv format.
-s REGION_SIZE, --region_size REGION_SIZE
Size of region of contig start and end to collect
barcodes from. [20000]
-m MOLECULE_SIZE, --molecule_size MOLECULE_SIZE
Estimated mean molecule size that went into Chromium
sequencing. Linked reads spanning a distance larger
than this size should be rare. [45000]
-F BARCODE_FRACTION, --barcode_fraction BARCODE_FRACTION
Minimum fraction of shared barcodes to create a link.
[0.01]
-f BARCODE_FACTOR, --barcode_factor BARCODE_FACTOR
Factor to determine outliers. [39]
-q MAPQ, --mapq MAPQ Mapping quality cutoff value for linkgraph. [60]
-Q SHORT_MAPQ, --short_mapq SHORT_MAPQ
Mapping quality cutoff value for pulling in short
contigs. [20]
-B SHORT_BC_QUANT, --short_bc_quant SHORT_BC_QUANT
Minimum number of reads per barcode. [2]
-c COVERAGE, --coverage COVERAGE
Coverage cutoff for trimming contig ends. [20]
-b BC_QUANTITY, --bc_quantity BC_QUANTITY
Minimum number of reads per barcode. [3]
-o OUTPUT, --output OUTPUT
Prefix for output files.
The primary parameters to consider for optimizing the scaffolding performed by ARBitR are barcode collection parameters.
--region_size determines the size of the regions on contig starts and ends to collect barcodes from. Draft assemblies created from long reads can allow for longer --region_size than short read assemblies. The important thing is to use a --region_size which exceeds the repetitive ends of contigs where no good read mappings are found. This region is dependent on the read length that created the draft assembly. --molecule_size determines the minimum length of long contigs. Contigs shorter than this parameter and sorted into the short contigs, and will be attempted to be placed into contigs during the junction filling step. This parameter needs to be longer than --region_size. --barcode_fraction is a hard limit for the minimum fraction of shared barcodes. Usually does not need to be changed. --barcode_factor is the factor for determining couples of contig ends with a significant fraction of shared barcodes. Higher values are more stringent. --mapq is the minimum mapping quality allowed for reads to collect barcodes from. Lower --mapq is less stringent and can be used if very few links are found by ARBitR (this can be checked in the .gfa output file). --bc_quantity is the minimum number of reads in a region that support a barcode. Barcodes with lower support are not used. Higher number is more stringent, and this parameter is dependent on the Chromium sequencing.