-
Notifications
You must be signed in to change notification settings - Fork 17
Peak Quantifying
BAMscale can quantify peaks in BED format from one or multiple BAM files. The output files are raw read counts, as well as FPKM, library and TPM normalized peak-scores (one file for each). A tutorial can be found here.
Basic Command (example)
./BAMscale cov --bed <preaks.bed> --bam <SAMPLE1.bam> --bam <SAMPLE2.bam> ... --bam <SAMPLEn.bam>
Topics on this page:
There are two required parameters: 1) BED file, and 2) one (or multiple) BAM files.
A parameter of interest is the output control. This can be done by specifying an output folder:
--outdir|-o <str> Output directory name (default: '.')
and/or by specifying an output prefix (you can reduce the length by including the full path here; not in "-o"):
--prefix|-n <str> Prefix for output.
Full Man-Page for BAMscale
Usage: BAMscale cov [OPTIONS] --bed <BEDFILE> --bam <BAM_1> (--bam <BAM_2> ... --bam <BAM_N>)
Output: Coverage tables (un-normalized, library-size normalized, FPKM and TPM)
Required options:
--bed|-b <file> Input BED file
--bam|-i <file> Input BAM file. This can be specified multiple times in case of multiple BAM files
Library options:
--libtype|-l <str> Sequencing type to be used. Can be: single, paired, and auto (default: autodetect)
--frag|-f <flag> Compute coverage using fragments instead of reads (default: no)
--strand|-s <flag> Reads need to have same orientation of peaks (default: unstranded)
--rstrand|-r <flag> Reads need to have reverse orientation of peaks (default: unstranded)
Sequencing coverage computation options:
--seqcov|-e <int> Compute sequencing coverage from BAM file quickly using the index (option '0'),
or count number of reads by parsing entire BAM file (much slower, IO intensive; set to '1')
--blacklist|-c <file> Input file with list of chromosomes to blacklist when computing coverage for normalization
--bedsubtract|-u <int> BED file with regions to subtract when computing coverage for normalization
These coordinates should not overlap so reads are not counted multiple times
Mapping options:
--mapq|-q <int> Minimum (at least) mapping quality (default: 0)
--keepdup|-d <flag> Keep duplicated reads (default: no)
--noproper|-p <flag> Do not filter un-proper alignments (default: filter)
--unmappair|-m <flag> Do not remove reads with unmapped pairs
--minfrag|-g <int> Minimum fragment size for read pairs (default: 0)
--maxfrag|-x <int> Maximum fragment size for read pairs (default: 2000)
--fragfilt|-w <int> Filter reads based on fragment size (default: no)
Output options:
--outdir|-o <str> Output directory name (default: '.')
--prefix|-n <str> Output prefix for file names (default: none)
Performance options:
--threads|-t <int> No. of threads to use (default: 1)
Description of Important Parameters
If there are regions (in a BED file), that might have unnecessarily high coverages, that might skew you results, specify this in the command:
--bedsubtract <BEDfile>
If you absolutely want to include duplicate alignments in the calculation, use this flag:
--keepdup
If it is necessary, to count read pairs as one, use this flag:
--frag
Filtering based on mapping quality (eg. 10):
--mapq 10
An example of a raw read count file (table)
After peak quantification, three different matrices are outputted for three normalization methods.
A standard normalization method. Read counts are divided by the length of peaks (in kilo) and the library size (million of reads).
This is the most widely used, as it scales for library size, and peak length as well.
This is a simpler method that normalizes to the library of the smallest sample, without normalizing for peak lengths.
A very similar method to FPKM. First, the reads/kilo (RPK) is summed of for every peak in each sample. The total RPK of every sample is divided by 1 million, giving the "per million" scaling for each sample. After that, the RPK of every peak is divided by the samples scaling factor.
This normalization (in our opinion) generates more reliable results with DE Seq2. A Detailed explanation can be found here and a helpful video
- Main page
- Home
- Installation
- Benchmarking
- Brief examples
- Detailed Manuals
- Visualization scripts