Starting Trycycler reconcile (2020-07-08 15:29:22) Trycycler reconcile is a tool for reconciling multiple alternative contigs with each other. Input reads: reads.fastq.gz size = 216,254,308 bytes Input contigs: trycycler/cluster_001/1_contigs/A_contig_1.fasta (1,037,018 bp) trycycler/cluster_001/1_contigs/B_contig_3.fasta (1,036,565 bp) trycycler/cluster_001/1_contigs/C_contig_3.fasta (998,838 bp) trycycler/cluster_001/1_contigs/D_contig_1.fasta (1,036,820 bp) trycycler/cluster_001/1_contigs/E_contig_3.fasta (1,036,795 bp) trycycler/cluster_001/1_contigs/F_utg000001c.fasta (1,042,360 bp) trycycler/cluster_001/1_contigs/G_utg000001c.fasta (1,042,247 bp) trycycler/cluster_001/1_contigs/H_utg000001c.fasta (1,042,318 bp) trycycler/cluster_001/1_contigs/I_utg000001l.fasta (958,895 bp) trycycler/cluster_001/1_contigs/J_utg000001c.fasta (1,042,635 bp) Checking required software: minimap2: v2.17-r954-dirty Initial check of contigs (2020-07-08 15:29:22) Before proceeding, Trycycler ensures that the input contigs appear sufficiently close to each other to make a consensus. If not, the program will quit and the user must fix the input contigs (make them more similar to each other) or exclude some before trying again. Relative sequence lengths: A_contig_1: 1.000 1.000 1.038 1.000 1.000 0.995 0.995 0.995 1.081 0.995 B_contig_3: 1.000 1.000 1.038 1.000 1.000 0.994 0.995 0.994 1.081 0.994 C_contig_3: 0.963 0.964 1.000 0.963 0.963 0.958 0.958 0.958 1.042 0.958 D_contig_1: 1.000 1.000 1.038 1.000 1.000 0.995 0.995 0.995 1.081 0.994 E_contig_3: 1.000 1.000 1.038 1.000 1.000 0.995 0.995 0.995 1.081 0.994 F_utg000001c: 1.005 1.006 1.044 1.005 1.005 1.000 1.000 1.000 1.087 1.000 G_utg000001c: 1.005 1.005 1.043 1.005 1.005 1.000 1.000 1.000 1.087 1.000 H_utg000001c: 1.005 1.006 1.044 1.005 1.005 1.000 1.000 1.000 1.087 1.000 I_utg000001l: 0.925 0.925 0.960 0.925 0.925 0.920 0.920 0.920 1.000 0.920 J_utg000001c: 1.005 1.006 1.044 1.006 1.006 1.000 1.000 1.000 1.087 1.000 Mash distances: A_contig_1: 0.000 0.004 0.005 0.004 0.004 0.006 0.006 0.006 0.008 0.006 B_contig_3: 0.004 0.000 0.005 0.004 0.004 0.006 0.006 0.006 0.008 0.006 C_contig_3: 0.005 0.005 0.000 0.005 0.005 0.007 0.006 0.007 0.009 0.007 D_contig_1: 0.004 0.004 0.005 0.000 0.004 0.006 0.006 0.006 0.008 0.006 E_contig_3: 0.003 0.004 0.005 0.003 0.000 0.006 0.005 0.005 0.008 0.006 F_utg000001c: 0.006 0.006 0.007 0.006 0.006 0.000 0.004 0.004 0.006 0.004 G_utg000001c: 0.006 0.006 0.006 0.006 0.005 0.004 0.000 0.004 0.006 0.004 H_utg000001c: 0.006 0.007 0.007 0.006 0.006 0.004 0.004 0.000 0.006 0.004 I_utg000001l: 0.008 0.008 0.009 0.008 0.008 0.006 0.006 0.006 0.000 0.006 J_utg000001c: 0.006 0.006 0.007 0.006 0.006 0.004 0.004 0.004 0.006 0.000 Contigs have passed the initial check - they seem sufficiently close to reconcile. Normalising strands (2020-07-08 15:29:25) In this step, Trycycler ensures that all sequences are on the same strand. It does this by first finding a sequence that occurs once in each contig and then flipping any of the contigs (converting to their reverse complement sequence) which have this sequence on the negative strand. Randomly-chosen common sequence: CGATATGCTGACAGTCGTAATTGTAAGAACAATCGCAGTTGGAATCCACG GTGTCAGATTCCGCTCGATCTACTTCAATTTCGCACTCGTAAACGTTATC GTACGAACCACGAATTTGTGCGCTAATACACACCGTCTCGCCATTCATAG ATAAGATTTTGGCACTAACCACAGCTCCTTGAGCAAATAACTCTTTTCCA TCTTGCAGTATGTTAGCTGTAAAATCTCTTCTTAATTTACGAAAATTAAG A_contig_1: + strand (using original sequence) B_contig_3: - strand (using reverse complement) C_contig_3: + strand (using original sequence) D_contig_1: + strand (using original sequence) E_contig_3: - strand (using reverse complement) F_utg000001c: + strand (using original sequence) G_utg000001c: + strand (using original sequence) H_utg000001c: + strand (using original sequence) I_utg000001l: + strand (using original sequence) J_utg000001c: - strand (using reverse complement) Circularisation (2020-07-08 15:29:28) Trycycler now compares the contigs to each other to repair any circularisation issues. After this step, each sequence should be cleanly circularised - i.e. the first base in the contig immediately follows the last base. Each contig will be circularised by looking for the position of its start and end in the other contigs. If necessary, additional sequence will be added or duplicated sequence will be removed. If there are multiple possible ways to fix a contig's circularisation, then Trycycler will use read alignments to choose the best one. Circularising A_contig_1: using B_contig_3: circularising A_contig_1 by adding 2 bp of sequence from B_contig_3 (412936-412938) using C_contig_3: circularising A_contig_1 by adding 2 bp of sequence from C_contig_3 (620999-621001) using D_contig_1: circularising A_contig_1 by adding 2 bp of sequence from D_contig_1 (414449-414451) using E_contig_3: circularising A_contig_1 by adding 2 bp of sequence from E_contig_3 (485442-485444) using F_utg000001c: circularising A_contig_1 by adding 2 bp of sequence from F_utg000001c (785926-785928) using G_utg000001c: circularising A_contig_1 by adding 2 bp of sequence from G_utg000001c (132089-132091) using H_utg000001c: circularising A_contig_1 by adding 2 bp of sequence from H_utg000001c (833839-833841) using I_utg000001l: circularising A_contig_1 by adding 2 bp of sequence from I_utg000001l (549328-549330) using J_utg000001c: circularising A_contig_1 by adding 2 bp of sequence from J_utg000001c (654628-654630) circularisation complete (1,037,020 bp) Circularising B_contig_3: using A_contig_1: circularising B_contig_3 by trimming 340 bp of sequence from the end using C_contig_3: circularising B_contig_3 by trimming 340 bp of sequence from the end using D_contig_1: circularising B_contig_3 by trimming 340 bp of sequence from the end using E_contig_3: circularising B_contig_3 by trimming 340 bp of sequence from the end using F_utg000001c: circularising B_contig_3 by trimming 340 bp of sequence from the end using G_utg000001c: circularising B_contig_3 by trimming 340 bp of sequence from the end using H_utg000001c: circularising B_contig_3 by trimming 340 bp of sequence from the end using I_utg000001l: circularising B_contig_3 by trimming 340 bp of sequence from the end using J_utg000001c: circularising B_contig_3 by trimming 340 bp of sequence from the end circularisation complete (1,036,225 bp) Circularising C_contig_3: using A_contig_1: circularising C_contig_3 by adding 106 bp of sequence from A_contig_1 (377895-378001) using B_contig_3: circularising C_contig_3 by adding 106 bp of sequence from B_contig_3 (790614-790720) using D_contig_1: circularising C_contig_3 by adding 106 bp of sequence from D_contig_1 (792251-792357) using E_contig_3: unable to circularise: C_contig_3's end could not be found in E_contig_3 using F_utg000001c: circularising C_contig_3 by adding 107 bp of sequence from F_utg000001c (123298-123405) using G_utg000001c: circularising C_contig_3 by adding 106 bp of sequence from G_utg000001c (511731-511837) using H_utg000001c: circularising C_contig_3 by adding 107 bp of sequence from H_utg000001c (171244-171351) using I_utg000001l: circularising C_contig_3 by adding 107 bp of sequence from I_utg000001l (929036-929143) using J_utg000001c: circularising C_contig_3 by adding 107 bp of sequence from J_utg000001c (1034445-1034552) choosing best circularisation of 2 alternatives alternative 1 (998,944 bp): score = 168,245,959 alternative 2 (998,945 bp): score = 168,245,951 best alternative: 1 circularisation complete (998,944 bp) Circularising D_contig_1: using A_contig_1: circularising D_contig_1 by adding 9 bp of sequence from A_contig_1 (622521-622530) using B_contig_3: unable to circularise: D_contig_1's end could not be found in B_contig_3 using C_contig_3: circularising D_contig_1 by adding 9 bp of sequence from C_contig_3 (244440-244449) using E_contig_3: circularising D_contig_1 by adding 9 bp of sequence from E_contig_3 (70927-70936) using F_utg000001c: circularising D_contig_1 by adding 9 bp of sequence from F_utg000001c (369169-369178) using G_utg000001c: circularising D_contig_1 by adding 9 bp of sequence from G_utg000001c (757648-757657) using H_utg000001c: circularising D_contig_1 by adding 9 bp of sequence from H_utg000001c (417198-417207) using I_utg000001l: circularising D_contig_1 by adding 9 bp of sequence from I_utg000001l (132550-132559) using J_utg000001c: circularising D_contig_1 by adding 9 bp of sequence from J_utg000001c (237744-237753) circularisation complete (1,036,829 bp) Circularising E_contig_3: using A_contig_1: circularising E_contig_3 by adding 138 bp of sequence from A_contig_1 (551479-551617) using B_contig_3: circularising E_contig_3 by adding 139 bp of sequence from B_contig_3 (964168-964307) using C_contig_3: circularising E_contig_3 by adding 137 bp of sequence from C_contig_3 (173395-173532) using D_contig_1: circularising E_contig_3 by adding 139 bp of sequence from D_contig_1 (965782-965921) using F_utg000001c: circularising E_contig_3 by adding 140 bp of sequence from F_utg000001c (297759-297899) using G_utg000001c: circularising E_contig_3 by adding 140 bp of sequence from G_utg000001c (686233-686373) using H_utg000001c: circularising E_contig_3 by adding 140 bp of sequence from H_utg000001c (345793-345933) using I_utg000001l: circularising E_contig_3 by adding 140 bp of sequence from I_utg000001l (61140-61280) using J_utg000001c: circularising E_contig_3 by adding 138 bp of sequence from J_utg000001c (166358-166496) choosing most common circularisation circularisation complete (1,036,935 bp) Circularising F_utg000001c: using A_contig_1: unable to circularise: F_utg000001c's start and end were found in multiple places in A_contig_1 using B_contig_3: unable to circularise: F_utg000001c's start and end were found in multiple places in B_contig_3 using C_contig_3: unable to circularise: F_utg000001c's start and end were found in multiple places in C_contig_3 using D_contig_1: unable to circularise: F_utg000001c's start and end were found in multiple places in D_contig_1 using E_contig_3: unable to circularise: F_utg000001c's start and end were found in multiple places in E_contig_3 using G_utg000001c: unable to circularise: F_utg000001c's start and end were found in multiple places in G_utg000001c using H_utg000001c: unable to circularise: F_utg000001c's start and end were found in multiple places in H_utg000001c using I_utg000001l: unable to circularise: F_utg000001c's start and end were found in multiple places in I_utg000001l using J_utg000001c: unable to circularise: F_utg000001c's start and end were found in multiple places in J_utg000001c Error: failed to circularise sequence F_utg000001c because its start/end sequences were found in multiple ambiguous places in other sequences. This is likely because F_utg000001c starts/ends in a repetitive region. You can either manually repair its circularisation (and ensure it does not start/end in a repetitive region) or exclude the sequence altogether and try again.