Skip to content

rare disease

Brent Pedersen edited this page Jun 27, 2019 · 52 revisions

using slivar for rare disease research

compound heterozygotes

slivar provides machinery for finding compound heterozygotes in trios using phase-by-inheritance. It also reports variant-pairs where 1 side is a de novo. It assumes the input has already been filtered to high-quality variants. The steps are:

  1. use slivar to filter to rare heterozygous (or denovo) variants
  2. pipe to an effect annotator like bcftools csq, VEP, or snpEff
  3. pipe to slivar compound-hets which uses the annotation from VEP to group variants by gene.

The output is a VCF of only comp-het variants with an extra INFO field of e.g.

slivar_comphet=$sample/$gene/$uniqid/$chrom/$pos/$ref/$alt 

which indicates that the current variant is compound-heterozygous in $sample, with the variant at$chrom:$posin$gene`.

Even with a less stringent frequency filter of < 0.005 and no filter on variant function (e.g. allowing synonymous). This can reduce the number of candidate pairs of variants down to about 1 dozen in a standard trio. The $uniqid is an integer that uniquely identifies a variant-pair in a sample.

Full analysis for trios with unaffected parents

The following commands encapsulate the current best-practices for finding denovos, x-linked denovos, recessives for X and autosome (including compound hets). For exomes, these 2 commands will often result in <20 variants per trio without even considering the variant effect (missense/synonymous/etc). And a majority of those variants are pairs of a compound-het so the number of unique genes is often under < 10. The commands use the javascript functions in a file named my.js those are easily modified to allow more/less stringency or different logic.

## set these variables
vcf=path/to/vcf
ped=path/to/ped
fasta=/path/to/fasta
gff=Homo_sapiens.GRCh37.82.gff.gz # ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/
cohort=mycohort
## end variables
slivar expr --vcf $bcf --ped $ped \
    --pass-only \
    -g /home/brentp/src/slivar/gnomad.hg38.v2.zip \
    -g /home/brentp/src/slivar/topmed.hg38.dbsnp.151.zip \
    --info "INFO.topmed_af < 0.05" \
    --js ../slivar/js/my.js \
    --trio "denovo:denovo(kid, mom, dad) && hqrv(variant, INFO, 0.001)" \
    --trio "x_denovo:x_denovo(kid, mom, dad) && hqrv(variant, INFO, 0.01) && (variant.CHROM == 'X' || variant.CHROM == 'chrX')" \
    --trio "recessive:recessive(kid, mom, dad) && hqrv(variant, INFO, 0.01)" \
    --trio "x_recessive:x_recessive(kid, mom, dad) && hqrv(variant, INFO, 0.01) && (variant.CHROM == 'X' || variant.CHROM == 'chrX')" \
    --trio "comphet_side:comphet_side(kid, mom, dad) && hqrv(variant, INFO, 0.01) && INFO.gnomad_nhomalt_controls < 10" \
    | bcftools csq -s - --ncsq 40 -g $gff -l -f $fasta - -o vcfs/$cohort.vcf

# compound-hets then groups by gene:
slivar compound-hets -v vcfs/$cohort.vcf -f BCSQ -i 2 \
    --sample-field comphet_side --sample-field denovo -p $ped > vcfs/$cohort.ch.vcf

Note that for multi-sample VCFS, -s arguments to compound-hets make sure that only the trio(s) that passed the filters in the upstream slivar command are used to compose the 2 ends of a compound heterozgyote.

The TSV command can be used to get these filtered VCFs into spreadsheets for final evaluation. See its wiki page for more detail

Clone this wiki locally