-
Notifications
You must be signed in to change notification settings - Fork 28
Multiple sequence alignment
Before running this step, you'll need to have completed the previous one (reconciling contigs). I.e. you should have a Trycycler output directory (which I'll assume is called trycycler
) with subdirectories for each of your good clusters, each of which contains a 1_contigs
subdirectory and a 2_all_seqs.fasta
file.
Trycycler msa uses MUSCLE, so that will need to be installed before running (see Requirements).
This step takes the reconciled contig sequences (2_all_seqs.fasta
) and runs a multiple sequence alignment.
For example, it would take sequences like this:
A: GGCAGAGCGACGTAAATTACGAGTAAAGGAGGGGAGAGCATTAAGCATGCCTAAACTG
B: GGCAGAGCGCGACGTAAATTACGAGTAAAAGGAGGGAGGAGCATTAAGCCATGCCTACTG
C: GGCAGAGCGCGACTAAATTTACGAGTAAAGGAGGGAGGAGCATAGCCATGCCTAAACTG
And produce an alignment like this:
A: GGCAGAG--CGACGTAAA-TTACGAGT-AAAGGAGGGGA-GAGCATTAAG-CATGCCTAAACTG
B: GGCAGAGCGCGACGTAAA-TTACGAGTAAAAGGA-GGGAGGAGCATTAAGCCATGCCT--ACTG
C: GGCAGAGCGCGAC-TAAATTTACGAGT-AAAGGA-GGGAGGAGCAT--AGCCATGCCTAAACTG
Since MUSCLE cannot handle very long input sequences, Trycycler msa first partitions the sequence into pieces, then it conducts a multiple sequence alignment on each, and finally it stitches the alignments together, like this:
ORIGINAL SEQUENCES:
GGCAGAGCGACGTAAATTACGAGTAAAGGAGGGGAGAGCATTAAGCATGCCTAAACTG
GGCAGAGCGCGACGTAAATTACGAGTAAAAGGAGGGAGGAGCATTAAGCCATGCCTACTG
GGCAGAGCGCGACTAAATTTACGAGTAAAGGAGGGAGGAGCATAGCCATGCCTAAACTG
PARITIONED SEQUENCES:
GGCAGAGCGAC GTAAATTACGAG TAAAGGAGGGGA GAGCATTAAGCATG CCTAAACTG
GGCAGAGCGCGAC GTAAATTACGAG TAAAAGGAGGGA GGAGCATTAAGCCATG CCTACTG
GGCAGAGCGCGAC TAAATTTACGAG TAAAGGAGGGA GGAGCATAGCCATG CCTAAACTG
ALIGNED PARTITIONS:
GGCAGAG--CGAC GTAAA-TTACGAG T-AAAGGAGGGGA -GAGCATTAAG-CATG CCTAAACTG
GGCAGAGCGCGAC GTAAA-TTACGAG TAAAAGGA-GGGA GGAGCATTAAGCCATG CCT--ACTG
GGCAGAGCGCGAC -TAAATTTACGAG T-AAAGGA-GGGA GGAGCAT--AGCCATG CCTAAACTG
MERGED ALIGNMENTS:
GGCAGAG--CGACGTAAA-TTACGAGT-AAAGGAGGGGA-GAGCATTAAG-CATGCCTAAACTG
GGCAGAGCGCGACGTAAA-TTACGAGTAAAAGGA-GGGAGGAGCATTAAGCCATGCCT--ACTG
GGCAGAGCGCGAC-TAAATTTACGAGT-AAAGGA-GGGAGGAGCAT--AGCCATGCCTAAACTG
The Trycycler msa command must be run separately for each of your good clusters. Assuming your good clusters are numbers 1, 7 and 8, these are the commands you would run:
trycycler msa --cluster_dir trycycler/cluster_001
trycycler msa --cluster_dir trycycler/cluster_007
trycycler msa --cluster_dir trycycler/cluster_008
Trycycler msa will typically take a few minutes to complete. Longer sequences and larger numbers of sequences will be slower.
- Home
- Software requirements
- Installation
-
How to run Trycycler
- Quick start
- Step 1: Generating assemblies
- Step 2: Clustering contigs
- Step 3: Reconciling contigs
- Step 4: Multiple sequence alignment
- Step 5: Partitioning reads
- Step 6: Generating a consensus
- Step 7: Polishing after Trycycler
- Illustrated pipeline overview
- Demo datasets
- Implementation details
- FAQ and miscellaneous tips
- Other pages
- Guide to bacterial genome assembly (choose your own adventure)
- Accuracy vs depth