Starting Trycycler consensus (2020-07-06 14:25:25) Trycycler consensus is the final stage of the Trycycler pipeline. It operates on one replicon (i.e. cluster) at a time. It takes the multiple sequence alignment of alternative contig sequences and combines them into a single consensus sequence. Where needed, it will use read alignments to help choose which variants to include/exclude from the consensus sequence. If all goes well, the final consensus will be free of any large-scale errors. Input reads: trycycler/cluster_002/4_reads.fastq 2,229 reads (10,440,509 bp) N50 = 6,729 bp Input contigs: trycycler/cluster_002/2_all_seqs.fasta A_contig_2: 7,504 bp B_contig_2: 7,509 bp C_contig_2: 7,512 bp D_contig_2: 7,508 bp E_contig_2: 7,508 bp F_utg000002c: 7,523 bp G_utg000002c: 7,509 bp H_utg000002c: 7,509 bp I_utg000002c: 7,506 bp J_utg000002c: 7,511 bp Checking required software: minimap2: v2.17-r954-dirty Partitioning MSA (2020-07-06 14:25:25) The multiple sequence alignment is now partitioned into chunks. Chunks where the input contig sequences are all in agreement are called "same" chunks, and those where the input contig sequences disagree are called "different" chunks. The consensus sequence will be made by choosing a best option for each of the different chunks. chunks: 51 (26 same, 25 different) combining small chunks: 41 (21 same, 20 different) Saving sequences to graph: trycycler/cluster_002/5_chunked_sequence.gfa Initial consensus (2020-07-06 14:25:25) Trycycler now makes an initial consensus sequence by choosing a sequence for each of the different chunks. The chosen sequence is the one with the lowest total Hamming distance to the other sequences. For example, a chunk with options of TT, TT, CC, CC and TA will give a consensus of TT. If the total Hamming distances fail to break a tie or if all sequences differ, the chunk will be flagged for read-based assessment. Consensus length: 7,510 bp Different chunks needing assessment: 0 Different chunks not needing assessment: 20 Saving sequence to file: trycycler/cluster_002/6_initial_consensus.fasta Indexing reads (2020-07-06 14:25:25) Trycycler now aligns all reads to the initial consensus to form an index of which reads span each of the chunks. This makes the following step faster, as only relevant reads will be used when conducting read-based assessment of chunks. No chunks need read-based assessment. Skipping this step. Choosing best options with reads (2020-07-06 14:25:25) For each of the chunks to be assessed, Trycycler now aligns the relevant reads to each alternative sequence. Whichever option gives the strongest read alignments (defined as the total alignment score for each of the read's best alignment) is chosen as the best. This should result in a consensus sequence which is more accurate than the initial consensus. No chunks need read-based assessment. Skipping this step. Saving sequence to file: trycycler/cluster_002/7_final_consensus.fasta