Starting Trycycler reconcile (2020-07-06 14:10:17) Trycycler reconcile is a tool for reconciling multiple alternative contigs with each other. Input reads: reads.fastq.gz size = 216,246,191 bytes Input contigs: trycycler/cluster_001/1_contigs/A_contig_1.fasta (1,044,289 bp) trycycler/cluster_001/1_contigs/B_contig_1.fasta (1,044,294 bp) trycycler/cluster_001/1_contigs/C_contig_1.fasta (1,044,312 bp) trycycler/cluster_001/1_contigs/D_contig_1.fasta (1,044,321 bp) trycycler/cluster_001/1_contigs/E_contig_1.fasta (1,044,320 bp) trycycler/cluster_001/1_contigs/F_utg000001c.fasta (1,044,419 bp) trycycler/cluster_001/1_contigs/G_utg000001c.fasta (1,044,418 bp) trycycler/cluster_001/1_contigs/H_utg000001c.fasta (1,044,435 bp) trycycler/cluster_001/1_contigs/I_utg000001c.fasta (1,044,409 bp) trycycler/cluster_001/1_contigs/J_utg000001c.fasta (1,044,416 bp) Checking required software: minimap2: v2.17-r954-dirty Initial check of contigs (2020-07-06 14:10:17) Before proceeding, Trycycler ensures that the input contigs appear sufficiently close to each other to make a consensus. If not, the program will quit and the user must fix the input contigs (make them more similar to each other) or exclude some before trying again. Relative sequence lengths: A_contig_1: 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 B_contig_1: 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 C_contig_1: 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 D_contig_1: 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 E_contig_1: 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 F_utg000001c: 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 G_utg000001c: 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 H_utg000001c: 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 I_utg000001c: 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 J_utg000001c: 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 Mash distances: A_contig_1: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 B_contig_1: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 C_contig_1: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 D_contig_1: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 E_contig_1: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 F_utg000001c: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 G_utg000001c: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 H_utg000001c: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 I_utg000001c: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 J_utg000001c: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Contigs have passed the initial check - they seem sufficiently close to reconcile. Normalising strands (2020-07-06 14:10:20) In this step, Trycycler ensures that all sequences are on the same strand. It does this by first finding a sequence that occurs once in each contig and then flipping any of the contigs (converting to their reverse complement sequence) which have this sequence on the negative strand. Randomly-chosen common sequence: CTCAATTTCTACCGCATAGGGATACTTATCGGAAGAAAGAGGATTGTGGA TACATAAATGGCGAATCTTCGTTTTGGAAAGACCGGGGGAAATATTGCCA ACGGTCACTTGAGCATGCAACTGTTGGGATAGCCACTGCTCAACAATACT TTCTTTTTTTATCCAGAAGTACCCTACAATACAACAGGCTATGAGAAATG CGCTCTTGATTAGTTTAAACATAGAAAAGGATGGTCGTAAGCACTAGATG A_contig_1: - strand (using reverse complement) B_contig_1: - strand (using reverse complement) C_contig_1: - strand (using reverse complement) D_contig_1: + strand (using original sequence) E_contig_1: + strand (using original sequence) F_utg000001c: + strand (using original sequence) G_utg000001c: - strand (using reverse complement) H_utg000001c: + strand (using original sequence) I_utg000001c: - strand (using reverse complement) J_utg000001c: - strand (using reverse complement) Circularisation (2020-07-06 14:10:22) Trycycler now compares the contigs to each other to repair any circularisation issues. After this step, each sequence should be cleanly circularised - i.e. the first base in the contig immediately follows the last base. Each contig will be circularised by looking for the position of its start and end in the other contigs. If necessary, additional sequence will be added or duplicated sequence will be removed. If there are multiple possible ways to fix a contig's circularisation, then Trycycler will use read alignments to choose the best one. Circularising A_contig_1: using B_contig_1: unable to circularise: A_contig_1's start/end is the same as B_contig_1's start/end using C_contig_1: unable to circularise: A_contig_1's start/end is the same as C_contig_1's start/end using D_contig_1: circularising A_contig_1 by adding 2 bp of sequence from D_contig_1 (1027747-1027749) using E_contig_1: circularising A_contig_1 by adding 2 bp of sequence from E_contig_1 (1027751-1027753) using F_utg000001c: circularising A_contig_1 by adding 2 bp of sequence from F_utg000001c (647318-647320) using G_utg000001c: circularising A_contig_1 by adding 2 bp of sequence from G_utg000001c (484080-484082) using H_utg000001c: circularising A_contig_1 by adding 2 bp of sequence from H_utg000001c (388777-388779) using I_utg000001c: circularising A_contig_1 by adding 2 bp of sequence from I_utg000001c (633106-633108) using J_utg000001c: circularising A_contig_1 by adding 2 bp of sequence from J_utg000001c (631481-631483) circularisation complete (1,044,291 bp) Circularising B_contig_1: using A_contig_1: unable to circularise: B_contig_1's start/end is the same as A_contig_1's start/end using C_contig_1: unable to circularise: B_contig_1's start/end is the same as C_contig_1's start/end using D_contig_1: circularising B_contig_1 by adding 1 bp of sequence from D_contig_1 (1027748-1027749) using E_contig_1: circularising B_contig_1 by adding 1 bp of sequence from E_contig_1 (1027752-1027753) using F_utg000001c: circularising B_contig_1 by adding 1 bp of sequence from F_utg000001c (647319-647320) using G_utg000001c: circularising B_contig_1 by adding 1 bp of sequence from G_utg000001c (484081-484082) using H_utg000001c: circularising B_contig_1 by adding 1 bp of sequence from H_utg000001c (388778-388779) using I_utg000001c: circularising B_contig_1 by adding 1 bp of sequence from I_utg000001c (633107-633108) using J_utg000001c: circularising B_contig_1 by adding 1 bp of sequence from J_utg000001c (631482-631483) circularisation complete (1,044,295 bp) Circularising C_contig_1: using A_contig_1: unable to circularise: C_contig_1's start/end is the same as A_contig_1's start/end using B_contig_1: unable to circularise: C_contig_1's start/end is the same as B_contig_1's start/end using D_contig_1: circularising C_contig_1 by adding 1 bp of sequence from D_contig_1 (1027748-1027749) using E_contig_1: circularising C_contig_1 by adding 1 bp of sequence from E_contig_1 (1027752-1027753) using F_utg000001c: circularising C_contig_1 by adding 1 bp of sequence from F_utg000001c (647319-647320) using G_utg000001c: circularising C_contig_1 by adding 1 bp of sequence from G_utg000001c (484081-484082) using H_utg000001c: circularising C_contig_1 by adding 1 bp of sequence from H_utg000001c (388778-388779) using I_utg000001c: circularising C_contig_1 by adding 1 bp of sequence from I_utg000001c (633107-633108) using J_utg000001c: circularising C_contig_1 by adding 1 bp of sequence from J_utg000001c (631482-631483) circularisation complete (1,044,313 bp) Circularising D_contig_1: using A_contig_1: circularising D_contig_1 by adding 1 bp of sequence from A_contig_1 (16571-16572) using B_contig_1: circularising D_contig_1 by adding 15 bp of sequence from B_contig_1 (16572-16587) using C_contig_1: circularising D_contig_1 by adding 1 bp of sequence from C_contig_1 (16571-16572) using E_contig_1: unable to circularise: D_contig_1's start/end is the same as E_contig_1's start/end using F_utg000001c: circularising D_contig_1 by adding 15 bp of sequence from F_utg000001c (663890-663905) using G_utg000001c: circularising D_contig_1 by adding 15 bp of sequence from G_utg000001c (500654-500669) using H_utg000001c: circularising D_contig_1 by adding 15 bp of sequence from H_utg000001c (405351-405366) using I_utg000001c: circularising D_contig_1 by adding 15 bp of sequence from I_utg000001c (649678-649693) using J_utg000001c: circularising D_contig_1 by adding 15 bp of sequence from J_utg000001c (648055-648070) choosing best circularisation of 2 alternatives alternative 1 (1,044,322 bp): score = 304,859,104 alternative 2 (1,044,336 bp): score = 304,860,762 best alternative: 2 circularisation complete (1,044,336 bp) Circularising E_contig_1: using A_contig_1: circularising E_contig_1 by adding 5 bp of sequence from A_contig_1 (16571-16576) using B_contig_1: circularising E_contig_1 by trimming 2 bp of sequence from the end using C_contig_1: circularising E_contig_1 by trimming 2 bp of sequence from the end using D_contig_1: unable to circularise: E_contig_1's start/end is the same as D_contig_1's start/end using F_utg000001c: circularising E_contig_1 by trimming 2 bp of sequence from the end using G_utg000001c: circularising E_contig_1 by trimming 2 bp of sequence from the end using H_utg000001c: circularising E_contig_1 by trimming 2 bp of sequence from the end using I_utg000001c: circularising E_contig_1 by trimming 2 bp of sequence from the end using J_utg000001c: circularising E_contig_1 by trimming 2 bp of sequence from the end choosing most common circularisation circularisation complete (1,044,318 bp) Circularising F_utg000001c: using A_contig_1: no adjustment needed (F_utg000001c is already circular) using B_contig_1: no adjustment needed (F_utg000001c is already circular) using C_contig_1: no adjustment needed (F_utg000001c is already circular) using D_contig_1: no adjustment needed (F_utg000001c is already circular) using E_contig_1: no adjustment needed (F_utg000001c is already circular) using G_utg000001c: no adjustment needed (F_utg000001c is already circular) using H_utg000001c: no adjustment needed (F_utg000001c is already circular) using I_utg000001c: no adjustment needed (F_utg000001c is already circular) using J_utg000001c: no adjustment needed (F_utg000001c is already circular) circularisation complete (1,044,419 bp) Circularising G_utg000001c: using A_contig_1: no adjustment needed (G_utg000001c is already circular) using B_contig_1: no adjustment needed (G_utg000001c is already circular) using C_contig_1: no adjustment needed (G_utg000001c is already circular) using D_contig_1: no adjustment needed (G_utg000001c is already circular) using E_contig_1: no adjustment needed (G_utg000001c is already circular) using F_utg000001c: no adjustment needed (G_utg000001c is already circular) using H_utg000001c: no adjustment needed (G_utg000001c is already circular) using I_utg000001c: no adjustment needed (G_utg000001c is already circular) using J_utg000001c: no adjustment needed (G_utg000001c is already circular) circularisation complete (1,044,418 bp) Circularising H_utg000001c: using A_contig_1: no adjustment needed (H_utg000001c is already circular) using B_contig_1: no adjustment needed (H_utg000001c is already circular) using C_contig_1: no adjustment needed (H_utg000001c is already circular) using D_contig_1: no adjustment needed (H_utg000001c is already circular) using E_contig_1: no adjustment needed (H_utg000001c is already circular) using F_utg000001c: no adjustment needed (H_utg000001c is already circular) using G_utg000001c: no adjustment needed (H_utg000001c is already circular) using I_utg000001c: no adjustment needed (H_utg000001c is already circular) using J_utg000001c: no adjustment needed (H_utg000001c is already circular) circularisation complete (1,044,435 bp) Circularising I_utg000001c: using A_contig_1: no adjustment needed (I_utg000001c is already circular) using B_contig_1: no adjustment needed (I_utg000001c is already circular) using C_contig_1: no adjustment needed (I_utg000001c is already circular) using D_contig_1: no adjustment needed (I_utg000001c is already circular) using E_contig_1: no adjustment needed (I_utg000001c is already circular) using F_utg000001c: no adjustment needed (I_utg000001c is already circular) using G_utg000001c: no adjustment needed (I_utg000001c is already circular) using H_utg000001c: no adjustment needed (I_utg000001c is already circular) using J_utg000001c: no adjustment needed (I_utg000001c is already circular) circularisation complete (1,044,409 bp) Circularising J_utg000001c: using A_contig_1: no adjustment needed (J_utg000001c is already circular) using B_contig_1: no adjustment needed (J_utg000001c is already circular) using C_contig_1: no adjustment needed (J_utg000001c is already circular) using D_contig_1: no adjustment needed (J_utg000001c is already circular) using E_contig_1: no adjustment needed (J_utg000001c is already circular) using F_utg000001c: no adjustment needed (J_utg000001c is already circular) using G_utg000001c: no adjustment needed (J_utg000001c is already circular) using H_utg000001c: no adjustment needed (J_utg000001c is already circular) using I_utg000001c: no adjustment needed (J_utg000001c is already circular) circularisation complete (1,044,416 bp) Finding starting sequence (2020-07-06 14:10:58) In this step, Trycycler finds a sequence to use as a starting point for each of the contigs. This can be a standard starting point (e.g. the dnaA gene) or if one is not found, then a randomly-chosen unique sequence will be used. If necessary, the sequences will be flipped (converted to their reverse complement sequence) to ensure that the starting sequence is on the positive strand. Looking for known starting sequences in each contig... Found starting sequence 0145_A363_RS01345 (chromosomal replication initiator protein DnaA) ATGCGAGCTTGGGAAGAGTTCCTTTTGCTTCAAGAAAAAGAAATTGGAGT... A_contig_1: - strand (using reverse complement) B_contig_1: - strand (using reverse complement) C_contig_1: - strand (using reverse complement) D_contig_1: - strand (using reverse complement) E_contig_1: - strand (using reverse complement) F_utg000001c: - strand (using reverse complement) G_utg000001c: - strand (using reverse complement) H_utg000001c: - strand (using reverse complement) I_utg000001c: - strand (using reverse complement) J_utg000001c: - strand (using reverse complement) Rotating contigs to starting sequence (2020-07-06 14:11:01) For a circular contig, any point in the sequence is a valid starting position and it can thus be 'rotated' by moving sequence from the contig start to the contig end. In this step, Trycycler rotates each contig such that it begins with the starting sequence, ensuring that all contigs begin and end together so they can be aligned to each other. A_contig_1: rotating by 578,391 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,044,291 bp) B_contig_1: rotating by 578,410 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,044,295 bp) C_contig_1: rotating by 578,425 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,044,313 bp) D_contig_1: rotating by 595,005 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,044,336 bp) E_contig_1: rotating by 594,964 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,044,318 bp) F_utg000001c: rotating by 975,567 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,044,419 bp) G_utg000001c: rotating by 94,386 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,044,418 bp) H_utg000001c: rotating by 189,703 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,044,435 bp) I_utg000001c: rotating by 989,774 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,044,409 bp) J_utg000001c: rotating by 991,401 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,044,416 bp) Pairwise global alignments (2020-07-06 14:11:02) Trycycler uses the edlib aligner to get global alignments between all pairs of sequences. This can help you to spot any problematic sequences that should be excluded before continuing. If you see any sequences with notably worse identities or max indels, you can remove them (delete the contig's FASTA) and run this command again. A_contig_1 vs B_contig_1... 99.98% identity, max indel = 2 A_contig_1 vs C_contig_1... 99.98% identity, max indel = 2 A_contig_1 vs D_contig_1... 99.98% identity, max indel = 14 A_contig_1 vs E_contig_1... 99.98% identity, max indel = 2 A_contig_1 vs F_utg000001c... 99.98% identity, max indel = 3 A_contig_1 vs G_utg000001c... 99.98% identity, max indel = 3 A_contig_1 vs H_utg000001c... 99.98% identity, max indel = 3 A_contig_1 vs I_utg000001c... 99.98% identity, max indel = 3 A_contig_1 vs J_utg000001c... 99.98% identity, max indel = 3 B_contig_1 vs C_contig_1... 99.98% identity, max indel = 2 B_contig_1 vs D_contig_1... 99.98% identity, max indel = 7 B_contig_1 vs E_contig_1... 99.98% identity, max indel = 2 B_contig_1 vs F_utg000001c... 99.98% identity, max indel = 3 B_contig_1 vs G_utg000001c... 99.98% identity, max indel = 3 B_contig_1 vs H_utg000001c... 99.98% identity, max indel = 3 B_contig_1 vs I_utg000001c... 99.98% identity, max indel = 3 B_contig_1 vs J_utg000001c... 99.98% identity, max indel = 3 C_contig_1 vs D_contig_1... 99.98% identity, max indel = 7 C_contig_1 vs E_contig_1... 99.98% identity, max indel = 2 C_contig_1 vs F_utg000001c... 99.98% identity, max indel = 3 C_contig_1 vs G_utg000001c... 99.98% identity, max indel = 3 C_contig_1 vs H_utg000001c... 99.98% identity, max indel = 3 C_contig_1 vs I_utg000001c... 99.98% identity, max indel = 2 C_contig_1 vs J_utg000001c... 99.98% identity, max indel = 3 D_contig_1 vs E_contig_1... 99.98% identity, max indel = 11 D_contig_1 vs F_utg000001c... 99.98% identity, max indel = 7 D_contig_1 vs G_utg000001c... 99.98% identity, max indel = 7 D_contig_1 vs H_utg000001c... 99.98% identity, max indel = 7 D_contig_1 vs I_utg000001c... 99.98% identity, max indel = 7 D_contig_1 vs J_utg000001c... 99.98% identity, max indel = 7 E_contig_1 vs F_utg000001c... 99.98% identity, max indel = 3 E_contig_1 vs G_utg000001c... 99.98% identity, max indel = 3 E_contig_1 vs H_utg000001c... 99.98% identity, max indel = 3 E_contig_1 vs I_utg000001c... 99.98% identity, max indel = 3 E_contig_1 vs J_utg000001c... 99.98% identity, max indel = 3 F_utg000001c vs G_utg000001c... 99.99% identity, max indel = 3 F_utg000001c vs H_utg000001c... 99.99% identity, max indel = 3 F_utg000001c vs I_utg000001c... 99.99% identity, max indel = 3 F_utg000001c vs J_utg000001c... 99.99% identity, max indel = 3 G_utg000001c vs H_utg000001c... 99.99% identity, max indel = 3 G_utg000001c vs I_utg000001c... 99.99% identity, max indel = 3 G_utg000001c vs J_utg000001c... 99.99% identity, max indel = 3 H_utg000001c vs I_utg000001c... 99.99% identity, max indel = 3 H_utg000001c vs J_utg000001c... 99.99% identity, max indel = 3 I_utg000001c vs J_utg000001c... 99.99% identity, max indel = 3 Pairwise identities: A_contig_1: 100.00% 99.98% 99.98% 99.98% 99.98% 99.98% 99.98% 99.98% 99.98% 99.98% B_contig_1: 99.98% 100.00% 99.98% 99.98% 99.98% 99.98% 99.98% 99.98% 99.98% 99.98% C_contig_1: 99.98% 99.98% 100.00% 99.98% 99.98% 99.98% 99.98% 99.98% 99.98% 99.98% D_contig_1: 99.98% 99.98% 99.98% 100.00% 99.98% 99.98% 99.98% 99.98% 99.98% 99.98% E_contig_1: 99.98% 99.98% 99.98% 99.98% 100.00% 99.98% 99.98% 99.98% 99.98% 99.98% F_utg000001c: 99.98% 99.98% 99.98% 99.98% 99.98% 100.00% 99.99% 99.99% 99.99% 99.99% G_utg000001c: 99.98% 99.98% 99.98% 99.98% 99.98% 99.99% 100.00% 99.99% 99.99% 99.99% H_utg000001c: 99.98% 99.98% 99.98% 99.98% 99.98% 99.99% 99.99% 100.00% 99.99% 99.99% I_utg000001c: 99.98% 99.98% 99.98% 99.98% 99.98% 99.99% 99.99% 99.99% 100.00% 99.99% J_utg000001c: 99.98% 99.98% 99.98% 99.98% 99.98% 99.99% 99.99% 99.99% 99.99% 100.00% Maximum insertion/deletion sizes: A_contig_1: 0 2 2 14 2 3 3 3 3 3 B_contig_1: 2 0 2 7 2 3 3 3 3 3 C_contig_1: 2 2 0 7 2 3 3 3 2 3 D_contig_1: 14 7 7 0 11 7 7 7 7 7 E_contig_1: 2 2 2 11 0 3 3 3 3 3 F_utg000001c: 3 3 3 7 3 0 3 3 3 3 G_utg000001c: 3 3 3 7 3 3 0 3 3 3 H_utg000001c: 3 3 3 7 3 3 3 0 3 3 I_utg000001c: 3 3 2 7 3 3 3 3 0 3 J_utg000001c: 3 3 3 7 3 3 3 3 3 0 Finished! (2020-07-06 14:11:25) All contig sequences are now reconciled and ready for the next step in the pipeline: trycycler msa. Saving sequences to file: trycycler/cluster_001/2_all_seqs.fasta