Starting Trycycler consensus (2020-07-07 10:03:23) Trycycler consensus is the final stage of the Trycycler pipeline. It operates on one replicon (i.e. cluster) at a time. It takes the multiple sequence alignment of alternative contig sequences and combines them into a single consensus sequence. Where needed, it will use read alignments to help choose which variants to include/exclude from the consensus sequence. If all goes well, the final consensus will be free of any large-scale errors. Input reads: trycycler/cluster_001/4_reads.fastq 25,220 reads (172,159,763 bp) N50 = 9,429 bp Input contigs: trycycler/cluster_001/2_all_seqs.fasta A_contig_1: 1,038,291 bp B_contig_2: 1,038,269 bp C_contig_1: 1,038,379 bp D_contig_1: 1,038,384 bp E_contig_1: 1,038,433 bp G_utg000001c: 1,042,913 bp H_utg000001c: 1,042,821 bp I_utg000001c: 1,042,871 bp J_utg000001c: 1,042,500 bp Checking required software: minimap2: v2.17-r954-dirty Partitioning MSA (2020-07-07 10:03:23) The multiple sequence alignment is now partitioned into chunks. Chunks where the input contig sequences are all in agreement are called "same" chunks, and those where the input contig sequences disagree are called "different" chunks. The consensus sequence will be made by choosing a best option for each of the different chunks. chunks: 26,945 (13,473 same, 13,472 different) combining small chunks: 12,882 (6,441 same, 6,441 different) Saving sequences to graph: trycycler/cluster_001/5_chunked_sequence.gfa Initial consensus (2020-07-07 10:03:26) Trycycler now makes an initial consensus sequence by choosing a sequence for each of the different chunks. The chosen sequence is the one with the lowest total Hamming distance to the other sequences. For example, a chunk with options of TT, TT, CC, CC and TA will give a consensus of TT. If the total Hamming distances fail to break a tie or if all sequences differ, the chunk will be flagged for read-based assessment. Consensus length: 1,040,963 bp Different chunks needing assessment: 142 Different chunks not needing assessment: 6,299 Saving sequence to file: trycycler/cluster_001/6_initial_consensus.fasta Indexing reads (2020-07-07 10:03:27) Trycycler now aligns all reads to the initial consensus to form an index of which reads span each of the chunks. This makes the following step faster, as only relevant reads will be used when conducting read-based assessment of chunks. Aligning reads to the initial consensus: 28,418 alignments Filtering for best alignment per read: 22,486 alignments Gathering reads for chunks: 142 / 142 Choosing best options with reads (2020-07-07 10:04:12) For each of the chunks to be assessed, Trycycler now aligns the relevant reads to each alternative sequence. Whichever option gives the strongest read alignments (defined as the total alignment score for each of the read's best alignment) is chosen as the best. This should result in a consensus sequence which is more accurate than the initial consensus. Processing chunks: 142 / 142 Chunks where sequence is... the same as in the initial consensus: 62 different to the initial consensus: 80 Saving sequence to file: trycycler/cluster_001/7_final_consensus.fasta