Peak Quantifying

BAMscale can quantify peaks in BED format from one or multiple BAM files. The output files are raw read counts, as well as FPKM, library and TPM normalized peak-scores (one file for each). A tutorial can be found here.

Basic Command (example)

./BAMscale cov --bed <preaks.bed> --bam <SAMPLE1.bam> --bam <SAMPLE2.bam> ... --bam <SAMPLEn.bam>

Topics on this page:

Command Line Options

There are two required parameters: 1) BED file, and 2) one (or multiple) BAM files.

A parameter of interest is the output control. This can be done by specifying an output folder:

--outdir|-o <str>	Output directory name (default: '.')

and/or by specifying an output prefix (you can reduce the length by including the full path here; not in "-o"):

--prefix|-n <str>   Prefix for output.

Full Man-Page for BAMscale

Usage: BAMscale cov [OPTIONS] --bed <BEDFILE> --bam <BAM_1> (--bam <BAM_2> ... --bam <BAM_N>)

Output: Coverage tables (un-normalized, library-size normalized, FPKM and TPM)

Required options:
	--bed|-b <file>		Input BED file
	--bam|-i <file>		Input BAM file. This can be specified multiple times in case of multiple BAM files

Library options:
	--libtype|-l <str>	Sequencing type to be used. Can be: single, paired, and auto (default: autodetect)
	--frag|-f <flag>	Compute coverage using fragments instead of reads (default: no)
	--strand|-s <flag>	Reads need to have same orientation of peaks (default: unstranded)
	--rstrand|-r <flag>	Reads need to have reverse orientation of peaks (default: unstranded)

Sequencing coverage computation options:
	--seqcov|-e <int>	Compute sequencing coverage from BAM file quickly using the index (option '0'),
				or count number of reads by parsing entire BAM file (much slower, IO intensive; set to '1')

	--blacklist|-c <file>	Input file with list of chromosomes to blacklist when computing coverage for normalization

	--bedsubtract|-u <int>	BED file with regions to subtract when computing coverage for normalization
				These coordinates should not overlap so reads are not counted multiple times

Mapping options:
	--mapq|-q <int>		Minimum (at least) mapping quality (default: 0)
	--keepdup|-d <flag>	Keep duplicated reads (default: no)
	--noproper|-p <flag>	Do not filter un-proper alignments (default: filter)
	--unmappair|-m <flag>	Do not remove reads with unmapped pairs
	--minfrag|-g <int>	Minimum fragment size for read pairs (default: 0)
	--maxfrag|-x <int>	Maximum fragment size for read pairs (default: 2000)
	--fragfilt|-w <int>	Filter reads based on fragment size (default: no)

Output options:
	--outdir|-o <str>	Output directory name (default: '.')
	--prefix|-n <str>	Output prefix for file names (default: none)

Performance options:
	--threads|-t <int>	No. of threads to use (default: 1)

Description of Important Parameters

If there are regions (in a BED file), that might have unnecessarily high coverages, that might skew you results, specify this in the command:

--bedsubtract <BEDfile>

If you absolutely want to include duplicate alignments in the calculation, use this flag:

--keepdup

If it is necessary, to count read pairs as one, use this flag:

--frag

Filtering based on mapping quality (eg. 10):

--mapq 10

Output Examples

An example of a raw read count file (table)

Normalization of Read Counts

After peak quantification, three different matrices are outputted for three normalization methods.

1) FPKM Normalization

A standard normalization method. Read counts are divided by the length of peaks (in kilo) and the library size (million of reads).

This is the most widely used, as it scales for library size, and peak length as well.

2) Library-Size Normalization

This is a simpler method that normalizes to the library of the smallest sample, without normalizing for peak lengths.

3) TPM Normalization

A very similar method to FPKM. First, the reads/kilo (RPK) is summed of for every peak in each sample. The total RPK of every sample is divided by 1 million, giving the "per million" scaling for each sample. After that, the RPK of every peak is divided by the samples scaling factor.

This normalization (in our opinion) generates more reliable results with DE Seq2. A Detailed explanation can be found here and a helpful video

Provide feedback

Saved searches

Use saved searches to filter your results more quickly