-
Notifications
You must be signed in to change notification settings - Fork 23
rare disease
slivar
provides machinery for finding compound heterozygotes in trios using phase-by-inheritance. It also reports variant-pairs where 1 side is a de novo. It assumes the input has already been filtered to high-quality variants. The steps are:
- use slivar to filter to rare heterozygous (or denovo) variants
- pipe to an effect annotator like bcftools csq, VEP, or snpEff
- pipe to
slivar compound-hets
which uses the annotation from VEP to group variants by gene.
The output is a VCF of only comp-het variants with an extra INFO field of e.g.
slivar_comphet=$sample/$gene/$uniqid/$chrom/$pos/$ref/$alt
which indicates that the current variant is compound-heterozygous in $sample
, with the variant at
$chrom:$posin
$gene`.
Even with a less stringent frequency filter of < 0.005
and no filter on variant function (e.g. allowing synonymous). This can reduce the number of candidate pairs of variants down to about 1 dozen in a standard trio.
The $uniqid
is an integer that uniquely identifies a variant-pair in a sample.
The following commands encapsulate the current best-practices for finding denovos, x-linked denovos, recessives for X and autosome (including compound hets). For exomes, these 2 commands will often result in <20 variants per trio without even considering the variant effect (missense/synonymous/etc). And a majority of those variants are pairs of a compound-het so the number of unique genes is often under < 10. The commands use the javascript functions in a file named my.js those are easily modified to allow more/less stringency or different logic.
## set these variables
vcf=path/to/vcf
ped=path/to/ped
fasta=/path/to/fasta
gff=Homo_sapiens.GRCh37.82.gff.gz # ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/
cohort=mycohort
## end variables
slivar expr --vcf $vcf --ped $ped \
--pass-only \
-g /home/brentp/src/slivar/gnomad.hg38.v2.zip \
--js ../slivar/js/my.js \
--trio "denovo:denovo(kid, mom, dad) && hqrv(variant, INFO, 0.001)" \
--trio "x_denovo:x_denovo(kid, mom, dad) && hqrv(variant, INFO, 0.01) && (variant.CHROM == 'X' || variant.CHROM == 'chrX')" \
--trio "recessive:recessive(kid, mom, dad) && hqrv(variant, INFO, 0.01)" \
--trio "x_recessive:x_recessive(kid, mom, dad) && hqrv(variant, INFO, 0.01) && (variant.CHROM == 'X' || variant.CHROM == 'chrX')" \
--trio "comphet_side:comphet_side(kid, mom, dad) && hqrv(variant, INFO, 0.01) && INFO.gnomad_nhomalt_controls < 10" \
| bcftools csq -s - --ncsq 40 -g $gff -l -f $fasta - -o vcfs/$cohort.vcf
# compound-hets then groups by gene:
slivar compound-hets -v vcfs/$cohort.vcf -f BCSQ -i 2 \
--sample-field comphet_side --sample-field denovo -p $ped > vcfs/$cohort.ch.vcf
Note that for multi-sample VCFS, -s
arguments to compound-hets
make sure that only the trio(s) that passed the filters in the upstream slivar
command are used to compose the 2 ends of a compound heterozgyote.
For hg38, it's wise to add:
-g /home/brentp/src/slivar/topmed.hg38.dbsnp.151.zip \
--info "INFO.topmed_af < 0.05" \
to the first command above. This removes common variants that are found in hg38, but not present in gnomAD (which is lifted from GRCh37. Get the zip file for topmed here
The TSV command can be used to get these filtered VCFs into spreadsheets for final evaluation. See its wiki page for more detail