Skip to content

How variants are chosen for the consensus sequence

Ryan Wick edited this page Jan 5, 2021 · 15 revisions

This page goes into more detail on how Trycycler produces a consensus sequence. Specifically, when faced with multiple different variants of a sequence, how does it choose which one is best?

Breaking the sequence into chunks

Take this hypothetical MSA as an input to Trycycler consensus:

GTAGAAAGGGAGGAGCTTTT-CGCCGCAGTCAACGAA-TAGCGTCTGAAAACGTGTATCATATCTTGCCTCGAAAAGCCGCACT
GTAGAAAGGGAGGAGCTTTTTCGCCGCAGTCAAC--A-TAGCGTCTGAAAACGTGTATCATCTCTTGCCTCGAAAATCCTCACT
GTAGAAAGGGAGGAGCTTTTTCGCCGCAGTCAAC--ATTAGCGTCTGAAAACGTGTATCATGTCTTGCCTCGAAAATCCTCACT
GTAGAAAGGGAGGAGCTTTTTCGCCGCAGTCAAC--A-TAGCGTCTGAAAACGTGTATCATCTCTTGCCTCGAAAAGCCGCACT
GTAGAAAGGGAGGAGCTTTT-CGCCGCAGTCAAC--A-TAGCGTCTGAAAACGTGTATCATGTCTTGCCTCGAAAATCCGCACT

Trycycler first divides the MSA into 'same' and 'different' chunks:

GTAGAAAGGGAGGAGCTTTT  -  CGCCGCAGTCAAC  GAA-  TAGCGTCTGAAAACGTGTATCAT  A  TCTTGCCTCGAAAA  GCCG  CACT
GTAGAAAGGGAGGAGCTTTT  T  CGCCGCAGTCAAC  --A-  TAGCGTCTGAAAACGTGTATCAT  C  TCTTGCCTCGAAAA  TCCT  CACT
GTAGAAAGGGAGGAGCTTTT  T  CGCCGCAGTCAAC  --AT  TAGCGTCTGAAAACGTGTATCAT  G  TCTTGCCTCGAAAA  TCCT  CACT
GTAGAAAGGGAGGAGCTTTT  T  CGCCGCAGTCAAC  --A-  TAGCGTCTGAAAACGTGTATCAT  C  TCTTGCCTCGAAAA  GCCG  CACT
GTAGAAAGGGAGGAGCTTTT  -  CGCCGCAGTCAAC  --A-  TAGCGTCTGAAAACGTGTATCAT  G  TCTTGCCTCGAAAA  TCCG  CACT
Clone this wiki locally