-
Notifications
You must be signed in to change notification settings - Fork 23
rare disease
slivar
provides machinery for finding compound heterozygotes in trios using phase-by-inheritance. It also reports variant-pairs where 1 side is a de novo. It assumes the input has already been filtered to high-quality variants. The steps are:
- use slivar to filter to rare heterozygous (or denovo) variants
- pipe to an effect annotator like bcftools csq, VEP, or snpEff
- pipe to
slivar compound-hets
which uses the annotation from VEP to group variants by gene.
The output is a VCF of only comp-het variants with an extra INFO field of e.g.
slivar_comphet=$sample/$gene/$uniqid/$chrom/$pos/$ref/$alt
which indicates that the current variant is compound-heterozygous in $sample
, with the variant at
$chrom:$posin
$gene`.
Even with a less stringent frequency filter of < 0.005
and no filter on variant function (e.g. allowing synonymous). This can reduce the number of candidate pairs of variants down to about 1 dozen in a standard trio.
The $uniqid
is an integer that uniquely identifies a variant-pair in a sample.
The following commands encapsulate the current best-practices for finding denovos, x-linked denovos, recessives for X and autosome (including compound hets). For exomes, these 2 commands will often result in <20 variants per trio without even considering the variant effect (missense/synonymous/etc). And a majority of those variants are pairs of a compound-het so the number of unique genes is often under < 10. The commands use the javascript functions in a file named my.js those are easily modified to allow more/less stringency or different logic.
## set these variables
vcf=path/to/vcf
ped=path/to/ped
fasta=/path/to/fasta
gff=Homo_sapiens.GRCh37.82.gff.gz # ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/
cohort=mycohort
## end variables
slivar expr --vcf $bcf --ped $ped \
--pass-only \
-g /home/brentp/src/slivar/gnomad.hg38.v2.zip \
-g /home/brentp/src/slivar/topmed.hg38.dbsnp.151.zip \
--info "INFO.topmed_af < 0.05" \
--js ../slivar/js/my.js \
--trio "denovo:denovo(kid, mom, dad) && hqrv(variant, INFO, 0.001)" \
--trio "x_denovo:x_denovo(kid, mom, dad) && hqrv(variant, INFO, 0.01) && (variant.CHROM == 'X' || variant.CHROM == 'chrX')" \
--trio "recessive:recessive(kid, mom, dad) && hqrv(variant, INFO, 0.01)" \
--trio "x_recessive:x_recessive(kid, mom, dad) && hqrv(variant, INFO, 0.01) && (variant.CHROM == 'X' || variant.CHROM == 'chrX')" \
--trio "comphet_side:comphet_side(kid, mom, dad) && hqrv(variant, INFO, 0.01) && INFO.gnomad_nhomalt_controls < 10" \
| bcftools csq -s - --ncsq 40 -g $gff -l -f $fasta - -o vcfs/$cohort.vcf
# compound-hets then groups by gene:
slivar compound-hets -v vcfs/$cohort.vcf -f BCSQ -i 2 \
--sample-field comphet_side --sample-field denovo -p $ped > vcfs/$cohort.ch.vcf
Note that for multi-sample VCFS, -s
arguments to compound-hets
make sure that only the trio(s) that passed the filters in the upstream slivar
command are used to compose the 2 ends of a compound heterozgyote.
The TSV command can be used to get these filtered VCFs into spreadsheets for final evaluation. See its wiki page for more detail