Skip to content

rare disease

Brent Pedersen edited this page Jun 27, 2019 · 52 revisions

using slivar for rare disease research

compound heterozygotes

slivar provides machinery for finding compound heterozygotes in trios using phase-by-inheritance. It also reports variant-pairs where 1 side is a de novo. It assumes the input has already been filtered to high-quality variants. The steps are:

  1. use slivar to filter to rare heterozygous (or denovo) variants
  2. pipe to an effect annotator like bcftools csq, VEP, or snpEff
  3. pipe to slivar compound-hets which uses the annotation from VEP to group variants by gene.

The output is a VCF of only comp-het variants with an extra INFO field of e.g.

slivar_comphet=$sample/$gene/$uniqid/$chrom/$pos/$ref/$alt 

which indicates that the current variant is compound-heterozygous in $sample, with the variant at$chrom:$posin$gene`.

Even with a less stringent frequency filter of < 0.005 and no filter on variant function (e.g. allowing synonymous). This can reduce the number of candidate pairs of variants down to about 1 dozen in a standard trio. The $uniqid is an integer that uniquely identifies a variant-pair in a sample.

Full analysis for trios with unaffected parents

The following commands encapsulate the current best-practices for finding denovos, x-linked denovos, recessives for X and autosome (including compound hets). For exomes, these 2 commands will often result in <20 variants per trio without even considering the variant effect (missense/synonymous/etc). And a majority of those variants are pairs of a compound-het so the number of unique genes is often under < 10. The commands use the javascript functions in a file named my.js those are easily modified to allow more/less stringency or different logic.

## set these variables
vcf=path/to/vcf
ped=path/to/ped
fasta=/path/to/fasta
gff=Homo_sapiens.GRCh37.82.gff.gz # ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/
cohort=mycohort
## end variables
slivar expr --vcf $vcf --ped $ped \
    --pass-only \
    -g /home/brentp/src/slivar/gnomad.hg38.v2.zip \
    --js ../slivar/js/my.js \
    --trio "denovo:denovo(kid, mom, dad) && hqrv(variant, INFO, 0.001)" \
    --trio "x_denovo:x_denovo(kid, mom, dad) && hqrv(variant, INFO, 0.01) && (variant.CHROM == 'X' || variant.CHROM == 'chrX')" \
    --trio "recessive:recessive(kid, mom, dad) && hqrv(variant, INFO, 0.01)" \
    --trio "x_recessive:x_recessive(kid, mom, dad) && hqrv(variant, INFO, 0.01) && (variant.CHROM == 'X' || variant.CHROM == 'chrX')" \
    --trio "comphet_side:comphet_side(kid, mom, dad) && hqrv(variant, INFO, 0.01) && INFO.gnomad_nhomalt_controls < 10" \
    | bcftools csq -s - --ncsq 40 -g $gff -l -f $fasta - -o vcfs/$cohort.vcf

# compound-hets then groups by gene:
slivar compound-hets -v vcfs/$cohort.vcf -f BCSQ -i 2 \
    --sample-field comphet_side --sample-field denovo -p $ped > vcfs/$cohort.ch.vcf

Note that for multi-sample VCFS, -s arguments to compound-hets make sure that only the trio(s) that passed the filters in the upstream slivar command are used to compose the 2 ends of a compound heterozgyote.

For hg38, it's wise to add:

    -g /home/brentp/src/slivar/topmed.hg38.dbsnp.151.zip \
    --info "INFO.topmed_af < 0.05" \

to the first command above. This removes common variants that are found in hg38, but not present in gnomAD (which is lifted from GRCh37. Get the zip file for topmed here

The TSV command can be used to get these filtered VCFs into spreadsheets for final evaluation. See its wiki page for more detail

Clone this wiki locally