Skip to content

How variants are chosen for the consensus sequence

Ryan Wick edited this page Jan 5, 2021 · 15 revisions

This page goes into more detail on how Trycycler produces a consensus sequence. Specifically, when faced with multiple different variants of a sequence, how does it choose which one is best?

Breaking the sequence into chunks

Take this hypothetical MSA as an input to Trycycler consensus:

GGAGGAGCTTTT-CGCCGCAGTCAACGAA-TAGCGTCTGAAAACGTGTATCATATCTTGCCTCGAAAAGCCGCACT
GGAGGAGCTTTTTCGCCGCAGTCAAC--A-TAGCGTCTGAAAACGTGTATCATCTCTTGCCTCGAAAATCCTCACT
GGAGGAGCTTTTTCGCCGCAGTCAAC--ATTAGCGTCTGAAAACGTGTATCATGTCTTGCCTCGAAAATCCTCACT
GGAGGAGCTTTTTCGCCGCAGTCAAC--A-TAGCGTCTGAAAACGTGTATCATCTCTTGCCTCGAAAAGCCGCACT
GGAGGAGCTTTT-CGCCGCAGTCAAC--A-TAGCGTCTGAAAACGTGTATCATGTCTTGCCTCGAAAATCCGCACT

Trycycler first divides the MSA into 'same' and 'different' chunks:

GGAGGAGCTTTT   -   CGCCGCAGTCAAC   GAA-   TAGCGTCTGAAAACGTGTATCAT   A   TCTTGCCTCGAAAA   GCCG   CACT
GGAGGAGCTTTT   T   CGCCGCAGTCAAC   --A-   TAGCGTCTGAAAACGTGTATCAT   C   TCTTGCCTCGAAAA   TCCT   CACT
GGAGGAGCTTTT   T   CGCCGCAGTCAAC   --AT   TAGCGTCTGAAAACGTGTATCAT   G   TCTTGCCTCGAAAA   TCCT   CACT
GGAGGAGCTTTT   T   CGCCGCAGTCAAC   --A-   TAGCGTCTGAAAACGTGTATCAT   C   TCTTGCCTCGAAAA   GCCG   CACT
GGAGGAGCTTTT   -   CGCCGCAGTCAAC   --A-   TAGCGTCTGAAAACGTGTATCAT   G   TCTTGCCTCGAAAA   TCCG   CACT
Clone this wiki locally