This software is designed to identify both simple and complex rearrangements from paired-end sequencing data. Specific information regarding the methodology can be found in the respective publication:
Zhao X, Emery SB, Myers B, Kidd JM, and Mills RE. Resolving complex strucural genomic rearrangements using a randomized approach. Genome Biology 2016 Jun 10;17:126
R: https://www.r-project.org/
python: https://www.python.org/
samtools: http://samtools.sourceforge.net/
Download and Install
git clone https://github.com/mills-lab/svelter.git
cd svelter
python setup.py install --user
export PATH=$PATH:$HOME/.local/bin
Setup working directory:
svelter.py Setup --reference reference.fa --workdir /working/directory/ --support ../Support/hg19/
Run svelter with its default setting:
svelter.py --sample /absolute/path/of/sample.bam --workdir /working/directory/
Exclude.ref.bed
, CN2.ref.bed
and Segdup.ref.bed
are available under Support for some versions of human reference genome.
'Exclude.ref.bed' specifies the genomic regions to be excluded from SV analysis;
CN2.ref.bed
specifies the copy neutral genomic regions where SVs are rarely reported;
Segdup.ref.bed
specifieds predefined segmental duplications in reference genome; will be excluded from analysis;
Customized versions could be used, as long as they are in bed format, collected in the same folder, and named in the same format.For more details, please see Support.
Pre-indexed files of certain reference genomes have been produced and kept under folder /Support/Index-ref. For specific reference, if not pre-indexed files provided, the optional parameter '--ref-index' could be omit and the indexed files would be produced through the setup step.
- reference file should have been indexed by calling samtools first:
samtools faidx ref.fasta
- in the Setup step, reference file should be specified by absolute path
- the pre-indexed files under ./Support/ref-index/ are restored through large file resources on github, which require manual download.
- working directory is required to be writable for temporary files
- with large sample size (eg. >50X whole genome sequencing), it is recommended that the parameter --null-copyneutral-perc 0.01 added to your command, with small ones (eg. < 10x), --null-copyneutral-perc 0.5 is recommended. This parameter decides the number of CN2 regions extracted for building null model.
svelter.py [options] [parameters]
Setup
Clean
NullModel
BPSearch
BPIntegrate
SVPredict
SVIntegrate
--workdir, writable working directory.
--reference, absolute path of reference genome. eg: .../svelter/reference/genome.fa
--support, folder containing all supportive file including: Exclude.bed,CN2.bed,Segdup.bed
--ref-index, folders containin pre-indexed files, if applicable. For certain versions of human genome, the indexed files are availabel from https://github.com/mills-lab/svelter.
--workdir, writable working directory.
--sample, input alignment file in bam format
--null-model, specify which stat model to be fitted on each parameter. if --null-model==C / Complex, negative bimodal distribution will be fitted to insertlenth; else, normal will be used
--null-copyneutral-length, minimum length requirement for --copyneutral regions used to build null model (default: 2000)
--null-copyneutral-perc, percentage of regions from --copyneutral to utilize (default: 0.1)
--null-random-length, specify the length of random regions if --copyneutral parameter not used (default: 5000)
--null-random-num, specify the number of random regions if --copyneutral parameter not used (default: 10000)
--num-iteration, maximum number of iterations per structure will run in SV predicting step
--qc-map-cutoff, the minimum mapping quality required for a breakpoint to be reported (default: 0.0)
--qc-align, minimum alignment quality required for mapped reads in bam file (default: 20)
--qc-split, minimum alighment of clipped parts of reads considered as a soft clip (default: 20)
--qc-structure, minimum quality score of a resolved structure to be considered as PASS and included in the output vcf file
--split-min-len, the minumum length of clip read considered as split; (default:10% of read length)
--prefix, output prefix for vcf and svelter files (default: input.vcf, input.svelter)
--ploidy, limit algorithm to specific zygosity (0:heterozygous only; 1:homozygous only; 2:both; default:2)
svelter.py NullModel --sample /absolute/path/of/sample.bam --workdir /working/directory
Optional Parameters:
--chromosome, name of chromosome to run. should match chromosome name in bam file
--null-model, specify which stat model to be fitted on each parameter. if --null-model==C / Complex, negative bimodal distribution will be fitted to insertlenth; else, normal will be used
--null-copyneutral-length, minimum length requirement for --copyneutral regions used to build null model (default: 2000)
--null-copyneutral-perc, percentage of regions from --copyneutral to utilize (default: 0.1)
--null-random-length, specify the length of random regions if --copyneutral parameter not used (default: 5000)
--null-random-num, specify the number of random regions if --copyneutral parameter not used (default: 10000)
--qc-align, minimum alignment quality required for mapped reads in bam file (default: 20)
--qc-split, minimum alighment of clipped parts of reads considered as a soft clip (default: 20)
--split-min-len, the minumum length of clip read considered as split (default:10% of read length)
svelter.py BPSearch --sample /absolute/path/of/sample.bam --workdir /working/directory
Optional Parameters:
--chromosome, name of chromosome to run. should match chromosome name in bam file
--null-model, specify which stat model to be fitted on each parameter. if --null-model==C / Complex, negative bimodal distribution will be fitted to insertlenth; else, normal will be used
--qc-align, minimum alignment quality required for mapped reads in bam file (default: 20)
--qc-split, minimum alighment of clipped parts of reads considered as a soft clip (default: 20)
--split-min-len, the minumum length of clip read considered as split; (default:10% of read length)
--qc-map-tool, the tool extracts mappability information from a bigWig file,avaliable from: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bigWigSummary
--qc-map-file, .bigWig file used to decide local genomic mappability, avaliable from: ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens/encodeDCC/wgEncodeMapability/
--qc-map-cutoff, the minimum mapping quality required for a breakpoint to be reported (default: 0.0)
svelter.py BPIntegrate --sample /absolute/path/of/sample.bam --workdir /working/directory
Optional Parameters:
--chromosome, name of chromosome to run. should match chromosome name in bam file
--batch, specify number of structures in each separate file (if 0, output files will be calssified by chromosomes; default, all BP clustered will be integrated in one txt file)
svelter.py SVPredict --sample sample.bam --workdir /working/directory --bp-file input/file/containing/clustered/bps
Optional Parameters:
--num-iteration, maximum number of iterations per structure will run
--ploidy, limit algorithm to specific zygosity (0:heterozygous only; 1:homozygous only; 2:both; default:2)
--null-model, specify which stat model to be fitted on each parameter. if --null-model==C / Complex, negative bimodal distribution will be fitted to insertlenth; else, normal will be used
--qc-align, minimum alignment quality required for mapped reads in bam file (default: 20)
svelter.py SVIntegrate --workdir /working/directory --prefix output --input-path path/of/output/from/Step4
Optional Parameters:
--qc-structure, minimum quality score of a resolved structure to be considered as PASS and included in the output vcf file