Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory allocation crash #2

Closed
rrwick opened this issue Jan 5, 2025 · 3 comments
Closed

Memory allocation crash #2

rrwick opened this issue Jan 5, 2025 · 3 comments

Comments

@rrwick
Copy link
Owner

rrwick commented Jan 5, 2025

I encountered this curious crash when running Autocycler on an RHEL server:

memory allocation of 141 bytes failed
thread '<unnamed>' panicked at std/src/sys/pal/unix/stack_overflow.rs:196:13[1]    747904 abort (core dumped)   compress -i assemblies -a autocycler_out

It's strange for the following reasons:

  • The server has plenty of memory, so a memory allocation shouldn't fail.
  • It only occurs with the musl build of Autocycler (a statically linked binary that includes all dependencies for maximum portability) but not with the glibc build (a dynamically linked binary more suited for typical Linux distributions).
  • I traced the issue to the indicatif progress spinner, which is a very lightweight part of the code.

I don't fully understand the problem, but the crash seems related to indicatif's use of a background thread (enable_steady_tick), which interacts poorly with musl's stricter threading or memory allocation behaviour.

In case anyone else runs into this problem, I've decided to also include a pre-built glibc-based binary of Autocycler in the releases. If you encounter this problem, using the glibc binary is a straightforward solution. If you're building Autocycler from source on Linux, it will default to using glibc, which avoids this problem.

I'm not sure if this is a rare issue unique to my server or a common problem affecting many users. If you also encounter this crash, please let me know. If it turns out to be widespread, I’ll consider a more robust solution, such as replacing indicatif with another progress spinner library.

@rrwick rrwick closed this as completed Jan 5, 2025
@traaymakers
Copy link

traaymakers commented Jan 9, 2025

Hi Ryan,

Thank you for developing this tool, it works great and is very accessible for even a non bioinformatician like me :).
I had a similar crash (I think), but only with one sample (out of 20ish I tried), I used the musl build as well, and will try your other build to resolve it specifically for this sample.

(autocycler) Tom@Server:/data/tom/autocycler/curtobacterium/PD7123/105976-072-022$ autocycler compress -i assemblies -a autocycler_out

Starting autocycler compress (2025-01-08 18:23:55)
    This command finds all assemblies in the given input directory and compresses them into a compacted De Bruijn graph. This graph can then be used to recover the assemblies (with autocycler decompress) or
generate a consensus assembly (with autocycler resolve).

Settings:
  --assemblies_dir assemblies
  --autocycler_dir autocycler_out
  --kmer 51
  --threads 8


Loading input assemblies (2025-01-08 18:23:55)
    Input assemblies are now loaded and each contig is given a unique ID.

   1: assemblies/canu_01.fasta tig00000001 (3801556 bp)
   2: assemblies/canu_01.fasta tig00000002 (2061097 bp)
   3: assemblies/canu_02.fasta tig00000001 (3801552 bp)
   4: assemblies/canu_02.fasta tig00000002 (2061092 bp)
   5: assemblies/canu_03.fasta tig00000001 (3801555 bp)
   6: assemblies/canu_03.fasta tig00000002 (2061094 bp)
   7: assemblies/canu_04.fasta tig00000001 (3801561 bp)
   8: assemblies/canu_04.fasta tig00000003 (2061097 bp)
   9: assemblies/miniasm_01.fasta utg000001c (3801591 bp)
  10: assemblies/miniasm_01.fasta utg000002c (2061110 bp)
  11: assemblies/miniasm_02.fasta utg000001c (2061103 bp)
  12: assemblies/miniasm_02.fasta utg000002c (3801588 bp)
  13: assemblies/miniasm_03.fasta utg000001c (2061107 bp)
  14: assemblies/miniasm_03.fasta utg000002c (3801599 bp)
  15: assemblies/miniasm_04.fasta utg000001c (3801581 bp)
  16: assemblies/miniasm_04.fasta utg000002c (2061112 bp)
  17: assemblies/necat_01.fasta bctg00000001 (2061191 bp)
  18: assemblies/necat_01.fasta bctg00000000 (3801624 bp)
  19: assemblies/necat_02.fasta bctg00000001 (2061209 bp)
  20: assemblies/necat_02.fasta bctg00000000 (3801719 bp)
  21: assemblies/necat_03.fasta bctg00000001 (2061133 bp)
  22: assemblies/necat_03.fasta bctg00000000 (3801673 bp)
  23: assemblies/necat_04.fasta bctg00000001 (2061164 bp)
  24: assemblies/necat_04.fasta bctg00000000 (3801854 bp)
  25: assemblies/nextdenovo_01.fasta ctg000000 (2088503 bp)
  26: assemblies/nextdenovo_01.fasta ctg000010 (3828547 bp)
  27: assemblies/nextdenovo_02.fasta ctg000000 (2090009 bp)
  28: assemblies/nextdenovo_02.fasta ctg000010 (3828514 bp)
  29: assemblies/nextdenovo_03.fasta ctg000000 (2082796 bp)
  30: assemblies/nextdenovo_03.fasta ctg000010 (3828992 bp)
  31: assemblies/nextdenovo_04.fasta ctg000000 (2086081 bp)
  32: assemblies/nextdenovo_04.fasta ctg000010 (3828200 bp)
  33: assemblies/raven_01.fasta Utg2326 (2061114 bp)
  34: assemblies/raven_01.fasta Utg2328 (3799098 bp)
  35: assemblies/raven_02.fasta Utg2272 (3801590 bp)
  36: assemblies/raven_02.fasta Utg2274 (2061110 bp)
  37: assemblies/raven_03.fasta Utg2266 (3797933 bp)
  38: assemblies/raven_03.fasta Utg2268 (2061119 bp)
  39: assemblies/raven_04.fasta Utg2254 (2061112 bp)
  40: assemblies/raven_04.fasta Utg2256 (3801583 bp)

40 sequences loaded from 20 assemblies


Building k-mer De Bruijn graph (2025-01-08 18:23:58)
    K-mers in the input sequences are now hashed to make a De Bruijn graph.

Graph contains 11399212 k-mers


Building compacted unitig graph (2025-01-08 18:27:53)
    All non-branching paths are now collapsed to form a compacted De Bruijn graph, a.k.a. a unitig graph.

3910 unitigs, 5246 links
total length: 5699606 bp


Simplifying unitig graph (2025-01-08 18:29:04)
    The graph structure is now simplified by moving sequence into repeat unitigs when possible.

thread 'main' panicked at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/thread/mod.rs:707:29:
failed to spawn thread: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

or the alternative error message I received:

⠋ simplifying graph...
thread '<unnamed>' panicked at std/src/sys/pal/unix/stack_overflow.rs:196:13:
failed to allocate an alternative stack: Out of memory (os error 12)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at core/src/panicking.rs:221:5:
panic in a function that cannot unwind
stack backtrace:
   0:     0x7de576fe229a - <unknown>
   1:     0x7de57701e093 - <unknown>
   2:     0x7de576fdefc3 - <unknown>
   3:     0x7de576fe20e2 - <unknown>
   4:     0x7de576fe32ac - <unknown>
   5:     0x7de576fe30f2 - <unknown>
   6:     0x7de576fe3887 - <unknown>
   7:     0x7de576fe36e6 - <unknown>
   8:     0x7de576fe2779 - <unknown>
   9:     0x7de576fe33ac - <unknown>
  10:     0x7de576aaacad - <unknown>
  11:     0x7de576aaad42 - <unknown>
  12:     0x7de576aaae66 - <unknown>
  13:     0x7de576fe63a4 - <unknown>
thread caused non-unwinding panic. aborting.
./final.sh: line 2: 1222139 Aborted                 (core dumped) autocycler compress -i assemblies -a autocycler_out

Error: file does not exist: autocycler_out/input_assemblies.gfa

Error: directory does not exist: autocycler_out/clustering/qc_pass/cluster_*

Error: directory does not exist: autocycler_out/clustering/qc_pass/cluster_*

Error: file does not exist: autocycler_out/clustering/qc_pass/cluster_*/5_final.gfa

(note that flye was excluded because: ERROR: The input contain reads with duplicated IDs. Make sure all reads have unique IDs and restart. The first problematic ID was: 7bd66c9f-eabf-4c38-aab7-cda70857e95c I will ask our bioinformaticians if they know an easy fix for this (without Flye most assemblies are fully resolved, so it did not matter much to exclude this assembler).

@rrwick
Copy link
Owner Author

rrwick commented Jan 9, 2025

Excluding one assembler (e.g. Flye) is probably fine, as long as you're still getting to a fully resolved Autocycler assembly. But there may be something strange going on in your read processing if you have duplicate read IDs, and that might cause problems elsewhere (e.g. read alignment), so you may want to root out the problem!

Regarding the memory crash, it occurred for you in the same part of Autocycler compress as it did for me, so very much looks like the same problem. Thanks for reporting it, and let me know if you have any issues with the gnu build.

@traaymakers
Copy link

traaymakers commented Jan 10, 2025

Our service provider provided us with bam files, I converted them to fastq with bedtools and compressed them with gzip. I am checking whether the conversion caused these issues or the bam files already had duplicate read IDs.
It worried us too, how do these assemblers distinguish between reads if they have the same ID? The assemblies seem to be resolved for most of them, only the isolates with linear plasmids as I mentioned on bluesky still have some issues. We are looking into it, thank you for your help!

edit: there were no duplicate IDs in the bam files, I used samtools instead of bedtools to convert, and that seems to have solved the issue. bedtools duplicated every read when converting to fastq

Regarding the memory crash, using the glibc build solved the problem 👍 :

(autocycler_glibc) Tom@OnderzoeksServer:/data/tom/autocycler/curtobacterium/PD7123/105976-072-022$ ./final.sh 

Starting autocycler compress (2025-01-10 08:15:31)
    This command finds all assemblies in the given input directory and compresses them into a compacted De Bruijn graph. This graph can then be used to recover the assemblies (with autocycler decompress) or
generate a consensus assembly (with autocycler resolve).

Settings:
  --assemblies_dir assemblies
  --autocycler_dir autocycler_out
  --kmer 51
  --threads 8


Loading input assemblies (2025-01-10 08:15:31)
    Input assemblies are now loaded and each contig is given a unique ID.

   1: assemblies/canu_01.fasta tig00000001 (3801556 bp)
   2: assemblies/canu_01.fasta tig00000002 (2061097 bp)
   3: assemblies/canu_02.fasta tig00000001 (3801552 bp)
   4: assemblies/canu_02.fasta tig00000002 (2061092 bp)
   5: assemblies/canu_03.fasta tig00000001 (3801555 bp)
   6: assemblies/canu_03.fasta tig00000002 (2061094 bp)
   7: assemblies/canu_04.fasta tig00000001 (3801561 bp)
   8: assemblies/canu_04.fasta tig00000003 (2061097 bp)
   9: assemblies/miniasm_01.fasta utg000001c (3801591 bp)
  10: assemblies/miniasm_01.fasta utg000002c (2061110 bp)
  11: assemblies/miniasm_02.fasta utg000001c (2061103 bp)
  12: assemblies/miniasm_02.fasta utg000002c (3801588 bp)
  13: assemblies/miniasm_03.fasta utg000001c (2061107 bp)
  14: assemblies/miniasm_03.fasta utg000002c (3801599 bp)
  15: assemblies/miniasm_04.fasta utg000001c (3801581 bp)
  16: assemblies/miniasm_04.fasta utg000002c (2061112 bp)
  17: assemblies/necat_01.fasta bctg00000001 (2061191 bp)
  18: assemblies/necat_01.fasta bctg00000000 (3801624 bp)
  19: assemblies/necat_02.fasta bctg00000001 (2061209 bp)
  20: assemblies/necat_02.fasta bctg00000000 (3801719 bp)
  21: assemblies/necat_03.fasta bctg00000001 (2061133 bp)
  22: assemblies/necat_03.fasta bctg00000000 (3801673 bp)
  23: assemblies/necat_04.fasta bctg00000001 (2061164 bp)
  24: assemblies/necat_04.fasta bctg00000000 (3801854 bp)
  25: assemblies/nextdenovo_01.fasta ctg000000 (2088503 bp)
  26: assemblies/nextdenovo_01.fasta ctg000010 (3828547 bp)
  27: assemblies/nextdenovo_02.fasta ctg000000 (2090009 bp)
  28: assemblies/nextdenovo_02.fasta ctg000010 (3828514 bp)
  29: assemblies/nextdenovo_03.fasta ctg000000 (2082796 bp)
  30: assemblies/nextdenovo_03.fasta ctg000010 (3828992 bp)
  31: assemblies/nextdenovo_04.fasta ctg000000 (2086081 bp)
  32: assemblies/nextdenovo_04.fasta ctg000010 (3828200 bp)
  33: assemblies/raven_01.fasta Utg2326 (2061114 bp)
  34: assemblies/raven_01.fasta Utg2328 (3799098 bp)
  35: assemblies/raven_02.fasta Utg2272 (3801590 bp)
  36: assemblies/raven_02.fasta Utg2274 (2061110 bp)
  37: assemblies/raven_03.fasta Utg2266 (3797933 bp)
  38: assemblies/raven_03.fasta Utg2268 (2061119 bp)
  39: assemblies/raven_04.fasta Utg2254 (2061112 bp)
  40: assemblies/raven_04.fasta Utg2256 (3801583 bp)

40 sequences loaded from 20 assemblies


Building k-mer De Bruijn graph (2025-01-10 08:15:39)
    K-mers in the input sequences are now hashed to make a De Bruijn graph.

Graph contains 11399212 k-mers


Building compacted unitig graph (2025-01-10 08:17:58)
    All non-branching paths are now collapsed to form a compacted De Bruijn graph, a.k.a. a unitig graph.

3910 unitigs, 5246 links
total length: 5699606 bp


Simplifying unitig graph (2025-01-10 08:18:56)
    The graph structure is now simplified by moving sequence into repeat unitigs when possible.

3910 unitigs, 5246 links
total length: 5642359 bp


Finished! (2025-01-10 08:18:58)
    You can now run autocycler cluster to group contigs based on their similarity.

Compressed unitig graph: autocycler_out/input_assemblies.gfa
Input assembly stats:    autocycler_out/input_assemblies.yaml
Time to run: 0:03:26.670401


Starting autocycler cluster (2025-01-10 08:18:58)
    This command takes a unitig graph (made by autocycler compress) and clusters the sequences based on their similarity. Ideally, each cluster will then contain sequences which can be combined into a consensus.

Settings:
  --autocycler_dir autocycler_out
  --cutoff 0.2
  --min_assemblies 5 (automatically set)
  --max_contigs 25


Pairwise distances (2025-01-10 08:18:59)
    Every pairwise distance between contigs is calculated based on the similarity of their paths through the graph.

40 sequences, 1600 total pairwise distances

Saving distance matrix:
  autocycler_out/clustering/pairwise_distances.phylip


Clustering sequences (2025-01-10 08:18:59)
    Contigs are organise into a tree using UPGMA. Then clusters are defined from the tree using the distance cutoff.

Saving clustering tree:
  autocycler_out/clustering/clustering.newick

Cluster 001:
  canu_01.fasta tig00000001 (3801556 bp)
  canu_02.fasta tig00000001 (3801552 bp)
  canu_03.fasta tig00000001 (3801555 bp)
  canu_04.fasta tig00000001 (3801561 bp)
  miniasm_01.fasta utg000001c (3801591 bp)
  miniasm_02.fasta utg000002c (3801588 bp)
  miniasm_03.fasta utg000002c (3801599 bp)
  miniasm_04.fasta utg000001c (3801581 bp)
  necat_01.fasta bctg00000000 (3801624 bp)
  necat_02.fasta bctg00000000 (3801719 bp)
  necat_03.fasta bctg00000000 (3801673 bp)
  necat_04.fasta bctg00000000 (3801854 bp)
  nextdenovo_01.fasta ctg000010 (3828547 bp)
  nextdenovo_02.fasta ctg000010 (3828514 bp)
  nextdenovo_03.fasta ctg000010 (3828992 bp)
  nextdenovo_04.fasta ctg000010 (3828200 bp)
  raven_01.fasta Utg2328 (3799098 bp)
  raven_02.fasta Utg2272 (3801590 bp)
  raven_03.fasta Utg2266 (3797933 bp)
  raven_04.fasta Utg2256 (3801583 bp)
  cluster distance: 0.001072
  passed QC

Cluster 002:
  canu_01.fasta tig00000002 (2061097 bp)
  canu_02.fasta tig00000002 (2061092 bp)
  canu_03.fasta tig00000002 (2061094 bp)
  canu_04.fasta tig00000003 (2061097 bp)
  miniasm_01.fasta utg000002c (2061110 bp)
  miniasm_02.fasta utg000001c (2061103 bp)
  miniasm_03.fasta utg000001c (2061107 bp)
  miniasm_04.fasta utg000002c (2061112 bp)
  necat_01.fasta bctg00000001 (2061191 bp)
  necat_02.fasta bctg00000001 (2061209 bp)
  necat_03.fasta bctg00000001 (2061133 bp)
  necat_04.fasta bctg00000001 (2061164 bp)
  nextdenovo_01.fasta ctg000000 (2088503 bp)
  nextdenovo_02.fasta ctg000000 (2090009 bp)
  nextdenovo_03.fasta ctg000000 (2082796 bp)
  nextdenovo_04.fasta ctg000000 (2086081 bp)
  raven_01.fasta Utg2326 (2061114 bp)
  raven_02.fasta Utg2274 (2061110 bp)
  raven_03.fasta Utg2268 (2061119 bp)
  raven_04.fasta Utg2254 (2061112 bp)
  cluster distance: 0.000324
  passed QC


Finished! (2025-01-10 08:19:00)
    You can now run autocycler trim on each cluster. If you want to manually inspect the clustering, you can view the following files.

Pairwise distances:         autocycler_out/clustering/pairwise_distances.phylip
Clustering tree (Newick):   autocycler_out/clustering/clustering.newick
Clustering tree (metadata): autocycler_out/clustering/clustering.tsv


Starting autocycler trim (2025-01-10 08:19:00)
    This command takes a single-cluster unitig graph (made by autocycler cluster) and trims any overlaps. It looks for both start-end overlaps (can occur with circular sequences) and hairpin overlaps (can occur
with linear sequences).

Settings:
  --cluster_dir autocycler_out/clustering/qc_pass/cluster_001
  --min_identity 0.75
  --max_unitigs 5000
  --mad 5
  --threads 8


Loading graph (2025-01-10 08:19:00)
    The unitig graph is now loaded into memory.

1811 unitigs, 2418 links
total length: 3707309 bp


Trim start-end overlaps (2025-01-10 08:19:00)
    Paths for circular replicons may contain start-end overlaps. These overlaps are searched for and trimmed if found.

canu_01.fasta tig00000001 (3801556 bp): not trimmed
canu_02.fasta tig00000001 (3801552 bp): not trimmed
canu_03.fasta tig00000001 (3801555 bp): not trimmed
canu_04.fasta tig00000001 (3801561 bp): not trimmed
miniasm_01.fasta utg000001c (3801591 bp): not trimmed
miniasm_02.fasta utg000002c (3801588 bp): not trimmed
miniasm_03.fasta utg000002c (3801599 bp): not trimmed
miniasm_04.fasta utg000001c (3801581 bp): not trimmed
necat_01.fasta bctg00000000 (3801624 bp): not trimmed
necat_02.fasta bctg00000000 (3801719 bp): not trimmed
necat_03.fasta bctg00000000 (3801673 bp): not trimmed
necat_04.fasta bctg00000000 (3801854 bp): not trimmed
nextdenovo_01.fasta ctg000010 (3828547 bp): trimmed to 3801578 bp
nextdenovo_02.fasta ctg000010 (3828514 bp): trimmed to 3801585 bp
nextdenovo_03.fasta ctg000010 (3828992 bp): trimmed to 3801585 bp
nextdenovo_04.fasta ctg000010 (3828200 bp): trimmed to 3801580 bp
raven_01.fasta Utg2328 (3799098 bp): not trimmed
raven_02.fasta Utg2272 (3801590 bp): not trimmed
raven_03.fasta Utg2266 (3797933 bp): not trimmed
raven_04.fasta Utg2256 (3801583 bp): not trimmed


Trim hairpin overlaps (2025-01-10 08:19:02)
    Paths for linear replicons may contain hairpin overlaps at the start and/or end of the contig. These overlaps are searched for and trimmed if found.

canu_01.fasta tig00000001 (3801556 bp): not trimmed
canu_02.fasta tig00000001 (3801552 bp): not trimmed
canu_03.fasta tig00000001 (3801555 bp): not trimmed
canu_04.fasta tig00000001 (3801561 bp): not trimmed
miniasm_01.fasta utg000001c (3801591 bp): not trimmed
miniasm_02.fasta utg000002c (3801588 bp): not trimmed
miniasm_03.fasta utg000002c (3801599 bp): not trimmed
miniasm_04.fasta utg000001c (3801581 bp): not trimmed
necat_01.fasta bctg00000000 (3801624 bp): not trimmed
necat_02.fasta bctg00000000 (3801719 bp): not trimmed
necat_03.fasta bctg00000000 (3801673 bp): not trimmed
necat_04.fasta bctg00000000 (3801854 bp): not trimmed
nextdenovo_01.fasta ctg000010 (3828547 bp): not trimmed
nextdenovo_02.fasta ctg000010 (3828514 bp): not trimmed
nextdenovo_03.fasta ctg000010 (3828992 bp): not trimmed
nextdenovo_04.fasta ctg000010 (3828200 bp): not trimmed
raven_01.fasta Utg2328 (3799098 bp): not trimmed
raven_02.fasta Utg2272 (3801590 bp): not trimmed
raven_03.fasta Utg2266 (3797933 bp): not trimmed
raven_04.fasta Utg2256 (3801583 bp): not trimmed


Exclude outliers (2025-01-10 08:19:05)
    Sequences which vary too much in their length are now excluded from the cluster.

Median sequence length:    3801584 bp
Median absolute deviation: 19 bp
Allowed length range:      3801489-3801679 bp

canu_01.fasta tig00000001 (3801556 bp): kept
canu_02.fasta tig00000001 (3801552 bp): kept
canu_03.fasta tig00000001 (3801555 bp): kept
canu_04.fasta tig00000001 (3801561 bp): kept
miniasm_01.fasta utg000001c (3801591 bp): kept
miniasm_02.fasta utg000002c (3801588 bp): kept
miniasm_03.fasta utg000002c (3801599 bp): kept
miniasm_04.fasta utg000001c (3801581 bp): kept
necat_01.fasta bctg00000000 (3801624 bp): kept
necat_02.fasta bctg00000000 (3801719 bp): excluded
necat_03.fasta bctg00000000 (3801673 bp): kept
necat_04.fasta bctg00000000 (3801854 bp): excluded
nextdenovo_01.fasta ctg000010 (3801578 bp): kept
nextdenovo_02.fasta ctg000010 (3801585 bp): kept
nextdenovo_03.fasta ctg000010 (3801585 bp): kept
nextdenovo_04.fasta ctg000010 (3801580 bp): kept
raven_01.fasta Utg2328 (3799098 bp): excluded
raven_02.fasta Utg2272 (3801590 bp): kept
raven_03.fasta Utg2266 (3797933 bp): excluded
raven_04.fasta Utg2256 (3801583 bp): kept


Clean graph (2025-01-10 08:19:05)
    The unitig graph is now cleaned up based on any trimming and/or exclusion that has occurred above.

1269 unitigs, 1696 links
total length: 3703607 bp


Finished! (2025-01-10 08:19:05)
    You can now run autocycler resolve on this cluster. If you want to manually inspect the trimming, you can run autocycler dotplot on the sequences both before and after trimming.

Unitig graph of trimmed sequences: autocycler_out/clustering/qc_pass/cluster_001/2_trimmed.gfa


Starting autocycler resolve (2025-01-10 08:19:05)
    This command resolves repeats in the unitig graph.

Settings:
  --cluster_dir autocycler_out/clustering/qc_pass/cluster_001


Loading graph (2025-01-10 08:19:05)
    The unitig graph is now loaded into memory.

1269 unitigs, 1696 links
total length: 3703607 bp


Finding anchor unitigs (2025-01-10 08:19:05)
    Anchor unitigs are those that occur once and only once in each sequence. They will definitely be present in the final sequence and will serve as the connection points for bridges.

480 anchor unitigs found


Building bridges (2025-01-10 08:19:05)
    Bridges connect one anchor unitig to the next.

     Unique bridges: 480
Conflicting bridges: 0


Applying unique bridges (2025-01-10 08:19:05)
    All unique bridges (those that do not conflict with other bridges) are now applied to the graph, with linear paths merged to create consentigs.

1 unitig, 1 link
total length: 3801591 bp

All bridges were unique, no culling necessary.


Finished! (2025-01-10 08:19:05)
Final consensus graph: autocycler_out/clustering/qc_pass/cluster_001/5_final.gfa


Starting autocycler trim (2025-01-10 08:19:05)
    This command takes a single-cluster unitig graph (made by autocycler cluster) and trims any overlaps. It looks for both start-end overlaps (can occur with circular sequences) and hairpin overlaps (can occur
with linear sequences).

Settings:
  --cluster_dir autocycler_out/clustering/qc_pass/cluster_002
  --min_identity 0.75
  --max_unitigs 5000
  --mad 5
  --threads 8


Loading graph (2025-01-10 08:19:05)
    The unitig graph is now loaded into memory.

1594 unitigs, 2138 links
total length: 1985935 bp


Trim start-end overlaps (2025-01-10 08:19:05)
    Paths for circular replicons may contain start-end overlaps. These overlaps are searched for and trimmed if found.

canu_01.fasta tig00000002 (2061097 bp): not trimmed
canu_02.fasta tig00000002 (2061092 bp): not trimmed
canu_03.fasta tig00000002 (2061094 bp): not trimmed
canu_04.fasta tig00000003 (2061097 bp): not trimmed
miniasm_01.fasta utg000002c (2061110 bp): not trimmed
miniasm_02.fasta utg000001c (2061103 bp): not trimmed
miniasm_03.fasta utg000001c (2061107 bp): not trimmed
miniasm_04.fasta utg000002c (2061112 bp): not trimmed
necat_01.fasta bctg00000001 (2061191 bp): not trimmed
necat_02.fasta bctg00000001 (2061209 bp): not trimmed
necat_03.fasta bctg00000001 (2061133 bp): not trimmed
necat_04.fasta bctg00000001 (2061164 bp): not trimmed
nextdenovo_01.fasta ctg000000 (2088503 bp): trimmed to 2061107 bp
nextdenovo_02.fasta ctg000000 (2090009 bp): trimmed to 2061108 bp
nextdenovo_03.fasta ctg000000 (2082796 bp): trimmed to 2061100 bp
nextdenovo_04.fasta ctg000000 (2086081 bp): trimmed to 2061114 bp
raven_01.fasta Utg2326 (2061114 bp): not trimmed
raven_02.fasta Utg2274 (2061110 bp): not trimmed
raven_03.fasta Utg2268 (2061119 bp): not trimmed
raven_04.fasta Utg2254 (2061112 bp): not trimmed


Trim hairpin overlaps (2025-01-10 08:19:06)
    Paths for linear replicons may contain hairpin overlaps at the start and/or end of the contig. These overlaps are searched for and trimmed if found.

canu_01.fasta tig00000002 (2061097 bp): not trimmed
canu_02.fasta tig00000002 (2061092 bp): not trimmed
canu_03.fasta tig00000002 (2061094 bp): not trimmed
canu_04.fasta tig00000003 (2061097 bp): not trimmed
miniasm_01.fasta utg000002c (2061110 bp): not trimmed
miniasm_02.fasta utg000001c (2061103 bp): not trimmed
miniasm_03.fasta utg000001c (2061107 bp): not trimmed
miniasm_04.fasta utg000002c (2061112 bp): not trimmed
necat_01.fasta bctg00000001 (2061191 bp): not trimmed
necat_02.fasta bctg00000001 (2061209 bp): not trimmed
necat_03.fasta bctg00000001 (2061133 bp): not trimmed
necat_04.fasta bctg00000001 (2061164 bp): not trimmed
nextdenovo_01.fasta ctg000000 (2088503 bp): not trimmed
nextdenovo_02.fasta ctg000000 (2090009 bp): not trimmed
nextdenovo_03.fasta ctg000000 (2082796 bp): not trimmed
nextdenovo_04.fasta ctg000000 (2086081 bp): not trimmed
raven_01.fasta Utg2326 (2061114 bp): not trimmed
raven_02.fasta Utg2274 (2061110 bp): not trimmed
raven_03.fasta Utg2268 (2061119 bp): not trimmed
raven_04.fasta Utg2254 (2061112 bp): not trimmed


Exclude outliers (2025-01-10 08:19:09)
    Sequences which vary too much in their length are now excluded from the cluster.

Median sequence length:    2061110 bp
Median absolute deviation: 8 bp
Allowed length range:      2061070-2061150 bp

canu_01.fasta tig00000002 (2061097 bp): kept
canu_02.fasta tig00000002 (2061092 bp): kept
canu_03.fasta tig00000002 (2061094 bp): kept
canu_04.fasta tig00000003 (2061097 bp): kept
miniasm_01.fasta utg000002c (2061110 bp): kept
miniasm_02.fasta utg000001c (2061103 bp): kept
miniasm_03.fasta utg000001c (2061107 bp): kept
miniasm_04.fasta utg000002c (2061112 bp): kept
necat_01.fasta bctg00000001 (2061191 bp): excluded
necat_02.fasta bctg00000001 (2061209 bp): excluded
necat_03.fasta bctg00000001 (2061133 bp): kept
necat_04.fasta bctg00000001 (2061164 bp): excluded
nextdenovo_01.fasta ctg000000 (2061107 bp): kept
nextdenovo_02.fasta ctg000000 (2061108 bp): kept
nextdenovo_03.fasta ctg000000 (2061100 bp): kept
nextdenovo_04.fasta ctg000000 (2061114 bp): kept
raven_01.fasta Utg2326 (2061114 bp): kept
raven_02.fasta Utg2274 (2061110 bp): kept
raven_03.fasta Utg2268 (2061119 bp): kept
raven_04.fasta Utg2254 (2061112 bp): kept


Clean graph (2025-01-10 08:19:09)
    The unitig graph is now cleaned up based on any trimming and/or exclusion that has occurred above.

1173 unitigs, 1582 links
total length: 1983986 bp


Finished! (2025-01-10 08:19:09)
    You can now run autocycler resolve on this cluster. If you want to manually inspect the trimming, you can run autocycler dotplot on the sequences both before and after trimming.

Unitig graph of trimmed sequences: autocycler_out/clustering/qc_pass/cluster_002/2_trimmed.gfa


Starting autocycler resolve (2025-01-10 08:19:09)
    This command resolves repeats in the unitig graph.

Settings:
  --cluster_dir autocycler_out/clustering/qc_pass/cluster_002


Loading graph (2025-01-10 08:19:09)
    The unitig graph is now loaded into memory.

1173 unitigs, 1582 links
total length: 1983986 bp


Finding anchor unitigs (2025-01-10 08:19:09)
    Anchor unitigs are those that occur once and only once in each sequence. They will definitely be present in the final sequence and will serve as the connection points for bridges.

575 anchor unitigs found


Building bridges (2025-01-10 08:19:09)
    Bridges connect one anchor unitig to the next.

     Unique bridges: 575
Conflicting bridges: 0


Applying unique bridges (2025-01-10 08:19:09)
    All unique bridges (those that do not conflict with other bridges) are now applied to the graph, with linear paths merged to create consentigs.

1 unitig, 1 link
total length: 2061112 bp

All bridges were unique, no culling necessary.


Finished! (2025-01-10 08:19:09)
Final consensus graph: autocycler_out/clustering/qc_pass/cluster_002/5_final.gfa


Starting autocycler combine (2025-01-10 08:19:09)
    This command combines different clusters into a single assembly file.

Settings:
  --autocycler_dir autocycler_out
  --in_gfas autocycler_out/clustering/qc_pass/cluster_001/5_final.gfa
            autocycler_out/clustering/qc_pass/cluster_002/5_final.gfa


Combining clusters (2025-01-10 08:19:09)
    This command combines different clusters into a single assembly file.

autocycler_out/clustering/qc_pass/cluster_001/5_final.gfa
1 unitig, 1 link
total length: 3801591 bp

autocycler_out/clustering/qc_pass/cluster_002/5_final.gfa
1 unitig, 1 link
total length: 2061112 bp


Finished! (2025-01-10 08:19:09)
Combined graph: autocycler_out/consensus_assembly.gfa
Combined fasta: autocycler_out/consensus_assembly.fasta

Consensus assembly is fully resolved 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants