Skip to content

Why didn't my assembly go well?

Ryan Wick edited this page Dec 18, 2024 · 8 revisions

There are two common reasons for Autocycler to fail to produce a completely resolved assembly:

  1. The input assemblies were low quality.
  2. The genome contains one or more linear sequences.

Low quality input assemblies

In order to generate a complete and clean consensus assembly, Autocycler requires that most input assemblies are complete – each sequence in the genome assembled to a single contig. If this is not the case, Autocycler will not run well. For example, if your bacterial genome has a 5 Mbp chromosome but each input assembly has the chromosome fragmented into two pieces, 2 Mbp and 3 Mbp, then Autocycler will not be able to create a complete 5 Mbp consensus sequence for the chromosome.

The most common reason for low-quality input assemblies is an insufficient long read set: either the depth is too low or the reads are too short. Ideally, the depth will be 100× or more, but sometimes good assemblies can be made with <50× depth. For read length, it is important that there are plenty of reads longer than the longest repeat in the genome. For many bacterial genomes, the longest repeat is ~5–6 kbp (the rRNA operon), so a long read set with an N50 of 8 kbp or more will be sufficient. However, some bacterial genomes have much longer repeats (e.g. multiple copies of a prophage) necessitating much longer reads to get a complete assembly.

Linear sequences

Autocycler can struggle to fully resolve linear sequences for a few reasons:

  • Input contigs can extend past hairpin ends, leading to erratic contig lengths. The Autocycler trim step can often but not always repair this.
  • Input contigs can be inconsistent regarding where blunt ends terminate, leading to unresolved sequences in the Autocycler resolve step.

See the Linear sequences page for a more thorough description of these problems. Future versions of Autocycler will aim to improve behaviour on linear sequences.

Now what do I do?

If your Autocycler assembly went poorly due to low-quality input assemblies, you can try the following:

  • Try using different assemblers to generate your input assemblies. While Autocycler comes with helper scripts for some common ones, any long-read assembler can potentially work.
  • Try different parameters when making your input assemblies. Some assemblers (e.g. Canu) have a large number of parameters that can influence the result.
  • Manually curate your input assemblies before using them with Autocycler.

If none of the above work well, then your read set is likely insufficient, in which case you may need to sequence again aiming for deeper and longer reads.

Clone this wiki locally