Skip to content

Multiple sequence alignment

Ryan Wick edited this page May 18, 2020 · 21 revisions

Requirements

Before running this step, you'll need to have completed the previous one (reconciling contigs). I.e. you should have a Trycycler output directory (which I'll assume is called trycycler) with subdirectories for each of your good clusters, each of which contains a 1_contigs subdirectory and a 2_all_seqs.fasta file.

Trycycler msa uses MUSCLE, so that will need to be installed before running (see Requirements).

Concept

This step takes the reconciled contig sequences (2_all_seqs.fasta) and runs a multiple sequence alignment.

For example, it would take sequences like this:

A: GGCAGAGCGACGTAAATTACGAGTAAAGGAGGGGAGAGCATTAAGCATGCCTAAACTG
B: GGCAGAGCGCGACGTAAATTACGAGTAAAAGGAGGGAGGAGCATTAAGCCATGCCTACTG
C: GGCAGAGCGCGACTAAATTTACGAGTAAAGGAGGGAGGAGCATAGCCATGCCTAAACTG

And produce an alignment like this:

A: GGCAGAG--CGACGTAAA-TTACGAGT-AAAGGAGGGGA-GAGCATTAAG-CATGCCTAAACTG
B: GGCAGAGCGCGACGTAAA-TTACGAGTAAAAGGA-GGGAGGAGCATTAAGCCATGCCT--ACTG
C: GGCAGAGCGCGAC-TAAATTTACGAGT-AAAGGA-GGGAGGAGCAT--AGCCATGCCTAAACTG

Since MUSCLE cannot handle very long input sequences, Trycycler msa first partitions the sequence into pieces, then it conducts a multiple sequence alignment on each, and finally it stitches the alignments together, like this:

ORIGINAL SEQUENCES:
GGCAGAGCGACGTAAATTACGAGTAAAGGAGGGGAGAGCATTAAGCATGCCTAAACTG
GGCAGAGCGCGACGTAAATTACGAGTAAAAGGAGGGAGGAGCATTAAGCCATGCCTACTG
GGCAGAGCGCGACTAAATTTACGAGTAAAGGAGGGAGGAGCATAGCCATGCCTAAACTG

PARITIONED SEQUENCES:
GGCAGAGCGAC     GTAAATTACGAG   TAAAGGAGGGGA   GAGCATTAAGCATG     CCTAAACTG
GGCAGAGCGCGAC   GTAAATTACGAG   TAAAAGGAGGGA   GGAGCATTAAGCCATG   CCTACTG
GGCAGAGCGCGAC   TAAATTTACGAG   TAAAGGAGGGA    GGAGCATAGCCATG     CCTAAACTG

ALIGNED PARTITIONS:
GGCAGAG--CGAC   GTAAA-TTACGAG   T-AAAGGAGGGGA   -GAGCATTAAG-CATG   CCTAAACTG
GGCAGAGCGCGAC   GTAAA-TTACGAG   TAAAAGGA-GGGA   GGAGCATTAAGCCATG   CCT--ACTG
GGCAGAGCGCGAC   -TAAATTTACGAG   T-AAAGGA-GGGA   GGAGCAT--AGCCATG   CCTAAACTG

MERGED ALIGNMENTS:
GGCAGAG--CGACGTAAA-TTACGAGT-AAAGGAGGGGA-GAGCATTAAG-CATGCCTAAACTG
GGCAGAGCGCGACGTAAA-TTACGAGTAAAAGGA-GGGAGGAGCATTAAGCCATGCCT--ACTG
GGCAGAGCGCGAC-TAAATTTACGAGT-AAAGGA-GGGAGGAGCAT--AGCCATGCCTAAACTG

Running Trycycler msa

The Trycycler msa command must be run separately for each of your good clusters. Assuming your good clusters are numbers 1, 7 and 8, these are the commands you would run:

trycycler msa --cluster_dir trycycler/cluster_001
trycycler msa --cluster_dir trycycler/cluster_007
trycycler msa --cluster_dir trycycler/cluster_008

Trycycler msa will typically take a few minutes to complete. Longer sequences and larger numbers of sequences will be slower.

Settings

Output

Clone this wiki locally