Starting Trycycler reconcile (2020-07-08 15:17:45) Trycycler reconcile is a tool for reconciling multiple alternative contigs with each other. Input reads: reads.fastq.gz size = 216,254,308 bytes Input contigs: trycycler/cluster_002/1_contigs/A_contig_10.fasta (7,455 bp) trycycler/cluster_002/1_contigs/B_contig_8.fasta (7,455 bp) trycycler/cluster_002/1_contigs/C_contig_4.fasta (7,535 bp) trycycler/cluster_002/1_contigs/D_contig_2.fasta (7,475 bp) trycycler/cluster_002/1_contigs/E_contig_18.fasta (7,450 bp) Checking required software: minimap2: v2.17-r954-dirty Initial check of contigs (2020-07-08 15:17:45) Before proceeding, Trycycler ensures that the input contigs appear sufficiently close to each other to make a consensus. If not, the program will quit and the user must fix the input contigs (make them more similar to each other) or exclude some before trying again. Relative sequence lengths: A_contig_10: 1.000 1.000 0.989 0.997 1.001 B_contig_8: 1.000 1.000 0.989 0.997 1.001 C_contig_4: 1.011 1.011 1.000 1.008 1.011 D_contig_2: 1.003 1.003 0.992 1.000 1.003 E_contig_18: 0.999 0.999 0.989 0.997 1.000 Mash distances: A_contig_10: 0.000 0.003 0.002 0.002 0.003 B_contig_8: 0.003 0.000 0.003 0.003 0.003 C_contig_4: 0.003 0.003 0.000 0.003 0.003 D_contig_2: 0.002 0.003 0.002 0.000 0.003 E_contig_18: 0.003 0.003 0.003 0.003 0.000 Contigs have passed the initial check - they seem sufficiently close to reconcile. Normalising strands (2020-07-08 15:17:45) In this step, Trycycler ensures that all sequences are on the same strand. It does this by first finding a sequence that occurs once in each contig and then flipping any of the contigs (converting to their reverse complement sequence) which have this sequence on the negative strand. Randomly-chosen common sequence: CAGAAGTTTATGCACTTTCTACAAGAGTACATCGGTCAACGAAGAGGTTT TGTCTTCGTAACTCGCTCCGGAAAAATGGTGGGGTTAAGGCAAATCGCCC GCACGTTCTCTCAAGCAGGACTACAAGCTGCAATCCCTTTTAAGATAACC CCGCACGTGCTTCGAGCAACCGCTGTGACGGAGTACAAACGCCTAGGGTG CTCAGACTCCGACATAATGAAGGTCACGGGACACGCAACCGCAAAGATGA A_contig_10: - strand (using reverse complement) B_contig_8: + strand (using original sequence) C_contig_4: + strand (using original sequence) D_contig_2: - strand (using reverse complement) E_contig_18: - strand (using reverse complement) Circularisation (2020-07-08 15:17:46) Trycycler now compares the contigs to each other to repair any circularisation issues. After this step, each sequence should be cleanly circularised - i.e. the first base in the contig immediately follows the last base. Each contig will be circularised by looking for the position of its start and end in the other contigs. If necessary, additional sequence will be added or duplicated sequence will be removed. If there are multiple possible ways to fix a contig's circularisation, then Trycycler will use read alignments to choose the best one. Circularising A_contig_10: using B_contig_8: circularising A_contig_10 by adding 4 bp of sequence from B_contig_8 (4448-4452) using C_contig_4: circularising A_contig_10 by adding 4 bp of sequence from C_contig_4 (4854-4858) using D_contig_2: circularising A_contig_10 by adding 4 bp of sequence from D_contig_2 (2550-2554) using E_contig_18: circularising A_contig_10 by adding 4 bp of sequence from E_contig_18 (6185-6189) circularisation complete (7,459 bp) Circularising B_contig_8: using A_contig_10: no adjustment needed (B_contig_8 is already circular) using C_contig_4: unable to circularise: B_contig_8's end could not be found in C_contig_4 using D_contig_2: no adjustment needed (B_contig_8 is already circular) using E_contig_18: no adjustment needed (B_contig_8 is already circular) circularisation complete (7,455 bp) Circularising C_contig_4: using A_contig_10: circularising C_contig_4 by trimming 64 bp of sequence from the end using B_contig_8: unable to circularise: C_contig_4's start could not be found in B_contig_8 using D_contig_2: circularising C_contig_4 by trimming 64 bp of sequence from the end using E_contig_18: circularising C_contig_4 by trimming 64 bp of sequence from the end circularisation complete (7,471 bp) Circularising D_contig_2: using A_contig_10: circularising D_contig_2 by adding 7 bp of sequence from A_contig_10 (4910-4917) using B_contig_8: no adjustment needed (D_contig_2 is already circular) using C_contig_4: circularising D_contig_2 by adding 7 bp of sequence from C_contig_4 (2303-2310) using E_contig_18: circularising D_contig_2 by adding 7 bp of sequence from E_contig_18 (3638-3645) choosing most common circularisation circularisation complete (7,482 bp) Circularising E_contig_18: using A_contig_10: circularising E_contig_18 by adding 7 bp of sequence from A_contig_10 (1260-1267) using B_contig_8: circularising E_contig_18 by adding 7 bp of sequence from B_contig_8 (5713-5720) using C_contig_4: circularising E_contig_18 by adding 7 bp of sequence from C_contig_4 (6119-6126) using D_contig_2: circularising E_contig_18 by adding 7 bp of sequence from D_contig_2 (3815-3822) circularisation complete (7,457 bp) Finding starting sequence (2020-07-08 15:17:46) In this step, Trycycler finds a sequence to use as a starting point for each of the contigs. This can be a standard starting point (e.g. the dnaA gene) or if one is not found, then a randomly-chosen unique sequence will be used. If necessary, the sequences will be flipped (converted to their reverse complement sequence) to ensure that the starting sequence is on the positive strand. Looking for known starting sequences in each contig... Unable to find a suitable known starting sequence Randomly-chosen common sequence: CAGAAGTTTATGCACTTTCTACAAGAGTACATCGGTCAACGAAGAGGTTT TGTCTTCGTAACTCGCTCCGGAAAATGGTGGGGTTAAGGCAAATCGCCCG CACGTTCTCTCAAGCAGGACTACAAGCTGCAATCCCTTTTAAGATAACCC GCACGTGCTTCGAGCAACCGCTGTGACGGAGTACAAACGCCTAGGGTGCT CAGACTCCGACATAATGAAGGTCACGGGACACGCAACCGCAAAGATGATA A_contig_10: + strand (using original sequence) B_contig_8: + strand (using original sequence) C_contig_4: + strand (using original sequence) D_contig_2: + strand (using original sequence) E_contig_18: + strand (using original sequence) Rotating contigs to starting sequence (2020-07-08 15:17:46) For a circular contig, any point in the sequence is a valid starting position and it can thus be 'rotated' by moving sequence from the contig start to the contig end. In this step, Trycycler rotates each contig such that it begins with the starting sequence, ensuring that all contigs begin and end together so they can be aligned to each other. A_contig_10: rotating by 2,235 bp CAGAAGTTTATGCACTTTCT...GGATTATTATAACTTATCCT (7,459 bp) B_contig_8: rotating by 6,686 bp CAGAAGTTTATGCACTTTCT...GGATTATTATAACTTATCCT (7,455 bp) C_contig_4: rotating by 7,097 bp CAGAAGTTTATGCACTTTCT...GGATTATTATAACTTATCCT (7,471 bp) D_contig_2: rotating by 4,793 bp CAGAAGTTTATGCACTTTCT...GGATTATTATAACTTATCCT (7,482 bp) E_contig_18: rotating by 968 bp CAGAAGTTTATGCACTTTCT...GGATTATTATAACTTATCCT (7,457 bp) Pairwise global alignments (2020-07-08 15:17:46) Trycycler uses the edlib aligner to get global alignments between all pairs of sequences. This can help you to spot any problematic sequences that should be excluded before continuing. If you see any sequences with notably worse identities or max indels, you can remove them (delete the contig's FASTA) and run this command again. A_contig_10 vs B_contig_8... 99.67% identity, max indel = 1 A_contig_10 vs C_contig_4... 99.60% identity, max indel = 4 A_contig_10 vs D_contig_2... 99.64% identity, max indel = 4 A_contig_10 vs E_contig_18... 99.71% identity, max indel = 1 B_contig_8 vs C_contig_4... 99.55% identity, max indel = 4 B_contig_8 vs D_contig_2... 99.56% identity, max indel = 4 B_contig_8 vs E_contig_18... 99.64% identity, max indel = 1 C_contig_4 vs D_contig_2... 99.53% identity, max indel = 4 C_contig_4 vs E_contig_18... 99.60% identity, max indel = 4 D_contig_2 vs E_contig_18... 99.59% identity, max indel = 4 Pairwise identities: A_contig_10: 100.00% 99.67% 99.60% 99.64% 99.71% B_contig_8: 99.67% 100.00% 99.55% 99.56% 99.64% C_contig_4: 99.60% 99.55% 100.00% 99.53% 99.60% D_contig_2: 99.64% 99.56% 99.53% 100.00% 99.59% E_contig_18: 99.71% 99.64% 99.60% 99.59% 100.00% Maximum insertion/deletion sizes: A_contig_10: 0 1 4 4 1 B_contig_8: 1 0 4 4 1 C_contig_4: 4 4 0 4 4 D_contig_2: 4 4 4 0 4 E_contig_18: 1 1 4 4 0 Finished! (2020-07-08 15:17:46) All contig sequences are now reconciled and ready for the next step in the pipeline: trycycler msa. Saving sequences to file: trycycler/cluster_002/2_all_seqs.fasta