Starting Trycycler reconcile (2020-07-08 15:33:39) Trycycler reconcile is a tool for reconciling multiple alternative contigs with each other. Input reads: reads.fastq.gz size = 216,254,308 bytes Input contigs: trycycler/cluster_001/1_contigs/A_contig_1.fasta (1,037,018 bp) trycycler/cluster_001/1_contigs/B_contig_3.fasta (1,036,565 bp) trycycler/cluster_001/1_contigs/C_contig_3.fasta (998,838 bp) trycycler/cluster_001/1_contigs/D_contig_1.fasta (1,036,820 bp) trycycler/cluster_001/1_contigs/E_contig_3.fasta (1,036,795 bp) trycycler/cluster_001/1_contigs/G_utg000001c.fasta (1,042,247 bp) trycycler/cluster_001/1_contigs/H_utg000001c.fasta (1,042,318 bp) trycycler/cluster_001/1_contigs/J_utg000001c.fasta (1,042,635 bp) Checking required software: minimap2: v2.17-r954-dirty Initial check of contigs (2020-07-08 15:33:39) Before proceeding, Trycycler ensures that the input contigs appear sufficiently close to each other to make a consensus. If not, the program will quit and the user must fix the input contigs (make them more similar to each other) or exclude some before trying again. Relative sequence lengths: A_contig_1: 1.000 1.000 1.038 1.000 1.000 0.995 0.995 0.995 B_contig_3: 1.000 1.000 1.038 1.000 1.000 0.995 0.994 0.994 C_contig_3: 0.963 0.964 1.000 0.963 0.963 0.958 0.958 0.958 D_contig_1: 1.000 1.000 1.038 1.000 1.000 0.995 0.995 0.994 E_contig_3: 1.000 1.000 1.038 1.000 1.000 0.995 0.995 0.994 G_utg000001c: 1.005 1.005 1.043 1.005 1.005 1.000 1.000 1.000 H_utg000001c: 1.005 1.006 1.044 1.005 1.005 1.000 1.000 1.000 J_utg000001c: 1.005 1.006 1.044 1.006 1.006 1.000 1.000 1.000 Mash distances: A_contig_1: 0.000 0.004 0.005 0.004 0.004 0.006 0.006 0.006 B_contig_3: 0.004 0.000 0.005 0.004 0.004 0.006 0.006 0.006 C_contig_3: 0.005 0.005 0.000 0.005 0.005 0.006 0.007 0.007 D_contig_1: 0.004 0.004 0.005 0.000 0.004 0.006 0.006 0.006 E_contig_3: 0.003 0.004 0.005 0.003 0.000 0.005 0.005 0.006 G_utg000001c: 0.006 0.006 0.006 0.006 0.005 0.000 0.004 0.004 H_utg000001c: 0.006 0.007 0.007 0.006 0.006 0.004 0.000 0.004 J_utg000001c: 0.006 0.006 0.007 0.006 0.006 0.004 0.004 0.000 Contigs have passed the initial check - they seem sufficiently close to reconcile. Normalising strands (2020-07-08 15:33:41) In this step, Trycycler ensures that all sequences are on the same strand. It does this by first finding a sequence that occurs once in each contig and then flipping any of the contigs (converting to their reverse complement sequence) which have this sequence on the negative strand. Randomly-chosen common sequence: CTTGAACAAGGCTTGTCCTCTAATGATGGGCGGCCCACTTCCTCATGTTA TAGCTGCTAAAGCTATTGCTCTGAAAGAAGCTATGACGATCAATTTCAGG AAGTATGCGCATAAAGTGGTAGAGAATGCACGGACTTTGGCTGAAGTGTT CCAGCGGAACGGGCTACGATTACTCACTGGCGGGACAGATAATCACATGT TGATTATTGATCTAACTTCTCTAGGAGTCCCTGGACGTATTGCAGAAGAT A_contig_1: + strand (using original sequence) B_contig_3: - strand (using reverse complement) C_contig_3: + strand (using original sequence) D_contig_1: + strand (using original sequence) E_contig_3: - strand (using reverse complement) G_utg000001c: + strand (using original sequence) H_utg000001c: + strand (using original sequence) J_utg000001c: - strand (using reverse complement) Circularisation (2020-07-08 15:33:44) Trycycler now compares the contigs to each other to repair any circularisation issues. After this step, each sequence should be cleanly circularised - i.e. the first base in the contig immediately follows the last base. Each contig will be circularised by looking for the position of its start and end in the other contigs. If necessary, additional sequence will be added or duplicated sequence will be removed. If there are multiple possible ways to fix a contig's circularisation, then Trycycler will use read alignments to choose the best one. Circularising A_contig_1: using B_contig_3: circularising A_contig_1 by adding 2 bp of sequence from B_contig_3 (412936-412938) using C_contig_3: circularising A_contig_1 by adding 2 bp of sequence from C_contig_3 (620999-621001) using D_contig_1: circularising A_contig_1 by adding 2 bp of sequence from D_contig_1 (414449-414451) using E_contig_3: circularising A_contig_1 by adding 2 bp of sequence from E_contig_3 (485442-485444) using G_utg000001c: circularising A_contig_1 by adding 2 bp of sequence from G_utg000001c (132089-132091) using H_utg000001c: circularising A_contig_1 by adding 2 bp of sequence from H_utg000001c (833839-833841) using J_utg000001c: circularising A_contig_1 by adding 2 bp of sequence from J_utg000001c (654628-654630) circularisation complete (1,037,020 bp) Circularising B_contig_3: using A_contig_1: circularising B_contig_3 by trimming 340 bp of sequence from the end using C_contig_3: circularising B_contig_3 by trimming 340 bp of sequence from the end using D_contig_1: circularising B_contig_3 by trimming 340 bp of sequence from the end using E_contig_3: circularising B_contig_3 by trimming 340 bp of sequence from the end using G_utg000001c: circularising B_contig_3 by trimming 340 bp of sequence from the end using H_utg000001c: circularising B_contig_3 by trimming 340 bp of sequence from the end using J_utg000001c: circularising B_contig_3 by trimming 340 bp of sequence from the end circularisation complete (1,036,225 bp) Circularising C_contig_3: using A_contig_1: circularising C_contig_3 by adding 106 bp of sequence from A_contig_1 (377895-378001) using B_contig_3: circularising C_contig_3 by adding 106 bp of sequence from B_contig_3 (790614-790720) using D_contig_1: circularising C_contig_3 by adding 106 bp of sequence from D_contig_1 (792251-792357) using E_contig_3: unable to circularise: C_contig_3's end could not be found in E_contig_3 using G_utg000001c: circularising C_contig_3 by adding 106 bp of sequence from G_utg000001c (511731-511837) using H_utg000001c: circularising C_contig_3 by adding 107 bp of sequence from H_utg000001c (171244-171351) using J_utg000001c: circularising C_contig_3 by adding 107 bp of sequence from J_utg000001c (1034445-1034552) choosing best circularisation of 2 alternatives alternative 1 (998,944 bp): score = 168,245,959 alternative 2 (998,945 bp): score = 168,245,951 best alternative: 1 circularisation complete (998,944 bp) Circularising D_contig_1: using A_contig_1: circularising D_contig_1 by adding 9 bp of sequence from A_contig_1 (622521-622530) using B_contig_3: unable to circularise: D_contig_1's end could not be found in B_contig_3 using C_contig_3: circularising D_contig_1 by adding 9 bp of sequence from C_contig_3 (244440-244449) using E_contig_3: circularising D_contig_1 by adding 9 bp of sequence from E_contig_3 (70927-70936) using G_utg000001c: circularising D_contig_1 by adding 9 bp of sequence from G_utg000001c (757648-757657) using H_utg000001c: circularising D_contig_1 by adding 9 bp of sequence from H_utg000001c (417198-417207) using J_utg000001c: circularising D_contig_1 by adding 9 bp of sequence from J_utg000001c (237744-237753) circularisation complete (1,036,829 bp) Circularising E_contig_3: using A_contig_1: circularising E_contig_3 by adding 138 bp of sequence from A_contig_1 (551479-551617) using B_contig_3: circularising E_contig_3 by adding 139 bp of sequence from B_contig_3 (964168-964307) using C_contig_3: circularising E_contig_3 by adding 137 bp of sequence from C_contig_3 (173395-173532) using D_contig_1: circularising E_contig_3 by adding 139 bp of sequence from D_contig_1 (965782-965921) using G_utg000001c: circularising E_contig_3 by adding 140 bp of sequence from G_utg000001c (686233-686373) using H_utg000001c: circularising E_contig_3 by adding 140 bp of sequence from H_utg000001c (345793-345933) using J_utg000001c: circularising E_contig_3 by adding 138 bp of sequence from J_utg000001c (166358-166496) choosing most common circularisation circularisation complete (1,036,935 bp) Circularising G_utg000001c: using A_contig_1: no adjustment needed (G_utg000001c is already circular) using B_contig_3: circularising G_utg000001c by adding 8 bp of sequence from B_contig_3 (281571-281579) using C_contig_3: no adjustment needed (G_utg000001c is already circular) using D_contig_1: no adjustment needed (G_utg000001c is already circular) using E_contig_3: no adjustment needed (G_utg000001c is already circular) using H_utg000001c: no adjustment needed (G_utg000001c is already circular) using J_utg000001c: no adjustment needed (G_utg000001c is already circular) choosing most common circularisation circularisation complete (1,042,247 bp) Circularising H_utg000001c: using A_contig_1: no adjustment needed (H_utg000001c is already circular) using B_contig_3: no adjustment needed (H_utg000001c is already circular) using C_contig_3: no adjustment needed (H_utg000001c is already circular) using D_contig_1: no adjustment needed (H_utg000001c is already circular) using E_contig_3: no adjustment needed (H_utg000001c is already circular) using G_utg000001c: no adjustment needed (H_utg000001c is already circular) using J_utg000001c: no adjustment needed (H_utg000001c is already circular) circularisation complete (1,042,318 bp) Circularising J_utg000001c: using A_contig_1: no adjustment needed (J_utg000001c is already circular) using B_contig_3: no adjustment needed (J_utg000001c is already circular) using C_contig_3: no adjustment needed (J_utg000001c is already circular) using D_contig_1: no adjustment needed (J_utg000001c is already circular) using E_contig_3: no adjustment needed (J_utg000001c is already circular) using G_utg000001c: no adjustment needed (J_utg000001c is already circular) using H_utg000001c: no adjustment needed (J_utg000001c is already circular) circularisation complete (1,042,635 bp) Finding starting sequence (2020-07-08 15:34:34) In this step, Trycycler finds a sequence to use as a starting point for each of the contigs. This can be a standard starting point (e.g. the dnaA gene) or if one is not found, then a randomly-chosen unique sequence will be used. If necessary, the sequences will be flipped (converted to their reverse complement sequence) to ensure that the starting sequence is on the positive strand. Looking for known starting sequences in each contig... Found starting sequence 0145_A363_RS01345 (chromosomal replication initiator protein DnaA) ATGCGAGCTTGGGAAGAGTTCCTTTTGCTTCAAGAAAAAGAAATTGGAGT... A_contig_1: + strand (using original sequence) B_contig_3: + strand (using original sequence) C_contig_3: + strand (using original sequence) D_contig_1: + strand (using original sequence) E_contig_3: + strand (using original sequence) G_utg000001c: + strand (using original sequence) H_utg000001c: + strand (using original sequence) J_utg000001c: + strand (using original sequence) Rotating contigs to starting sequence (2020-07-08 15:34:36) For a circular contig, any point in the sequence is a valid starting position and it can thus be 'rotated' by moving sequence from the contig start to the contig end. In this step, Trycycler rotates each contig such that it begins with the starting sequence, ensuring that all contigs begin and end together so they can be aligned to each other. A_contig_1: rotating by 825,516 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,037,020 bp) B_contig_3: rotating by 201,416 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,036,225 bp) C_contig_3: rotating by 447,493 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (998,944 bp) D_contig_1: rotating by 202,960 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,036,829 bp) E_contig_3: rotating by 273,921 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,036,935 bp) G_utg000001c: rotating by 961,702 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,042,247 bp) H_utg000001c: rotating by 621,278 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,042,318 bp) J_utg000001c: rotating by 441,796 bp ATGCGAGCTTGGGAAGAGTT...TATTTAGCTGAGATAGTTTT (1,042,635 bp) Pairwise global alignments (2020-07-08 15:34:36) Trycycler uses the edlib aligner to get global alignments between all pairs of sequences. This can help you to spot any problematic sequences that should be excluded before continuing. If you see any sequences with notably worse identities or max indels, you can remove them (delete the contig's FASTA) and run this command again. A_contig_1 vs B_contig_3... 99.45% identity, max indel = 18 A_contig_1 vs C_contig_3... 95.89% identity, max indel = 39 A_contig_1 vs D_contig_1... 99.54% identity, max indel = 6 A_contig_1 vs E_contig_3... 99.52% identity, max indel = 4 A_contig_1 vs G_utg000001c... 99.21% identity, max indel = 6 A_contig_1 vs H_utg000001c... 99.22% identity, max indel = 6 A_contig_1 vs J_utg000001c... 99.17% identity, max indel = 22 B_contig_3 vs C_contig_3... 95.81% identity, max indel = 45 B_contig_3 vs D_contig_1... 99.46% identity, max indel = 18 B_contig_3 vs E_contig_3... 99.43% identity, max indel = 18 B_contig_3 vs G_utg000001c... 99.12% identity, max indel = 18 B_contig_3 vs H_utg000001c... 99.12% identity, max indel = 17 B_contig_3 vs J_utg000001c... 99.08% identity, max indel = 19 C_contig_3 vs D_contig_1... 95.89% identity, max indel = 559 C_contig_3 vs E_contig_3... 95.86% identity, max indel = 311 C_contig_3 vs G_utg000001c... 95.57% identity, max indel = 121 C_contig_3 vs H_utg000001c... 95.58% identity, max indel = 175 C_contig_3 vs J_utg000001c... 95.53% identity, max indel = 489 D_contig_1 vs E_contig_3... 99.52% identity, max indel = 6 D_contig_1 vs G_utg000001c... 99.20% identity, max indel = 6 D_contig_1 vs H_utg000001c... 99.21% identity, max indel = 6 D_contig_1 vs J_utg000001c... 99.15% identity, max indel = 19 E_contig_3 vs G_utg000001c... 99.19% identity, max indel = 6 E_contig_3 vs H_utg000001c... 99.20% identity, max indel = 6 E_contig_3 vs J_utg000001c... 99.14% identity, max indel = 22 G_utg000001c vs H_utg000001c... 99.41% identity, max indel = 6 G_utg000001c vs J_utg000001c... 99.35% identity, max indel = 22 H_utg000001c vs J_utg000001c... 99.36% identity, max indel = 22 Pairwise identities: A_contig_1: 100.00% 99.45% 95.89% 99.54% 99.52% 99.21% 99.22% 99.17% B_contig_3: 99.45% 100.00% 95.81% 99.46% 99.43% 99.12% 99.12% 99.08% C_contig_3: 95.89% 95.81% 100.00% 95.89% 95.86% 95.57% 95.58% 95.53% D_contig_1: 99.54% 99.46% 95.89% 100.00% 99.52% 99.20% 99.21% 99.15% E_contig_3: 99.52% 99.43% 95.86% 99.52% 100.00% 99.19% 99.20% 99.14% G_utg000001c: 99.21% 99.12% 95.57% 99.20% 99.19% 100.00% 99.41% 99.35% H_utg000001c: 99.22% 99.12% 95.58% 99.21% 99.20% 99.41% 100.00% 99.36% J_utg000001c: 99.17% 99.08% 95.53% 99.15% 99.14% 99.35% 99.36% 100.00% Error: some pairwise identities are below the minimum allowed value of 98.0%. Please remove offending sequences or lower the --min_identity threshold and try again.