Skip to content

Why didn't my assembly go well?

Ryan Wick edited this page Dec 18, 2024 · 8 revisions

There are two common reasons for Autocycler to fail to produce a completely resolved assembly:

  1. The input assemblies were low quality.
  2. The genome contains one or more linear sequences.

Low quality input assemblies

In order to generate a complete and clean consensus assembly, Autocycler requires that most input assemblies are complete – each sequence in the genome assembled to a single contig. If this is not the case, Autocycler will not run well. For example, if your bacterial genome has a 5 Mbp chromosome but each input assembly has the chromosome fragmented into two pieces, 2 Mbp and 3 Mbp, then Autocycler will not be able to create a complete 5 Mbp consensus sequence for the chromosome.

The most common reason for low-quality input assemblies is an insufficient long read set: either the depth is too low or the reads are too short. Ideally, the depth will be 100× or more, but sometimes good assemblies can be made with <50× depth. For read length, it is important that there are plenty of reads longer than the longest repeat in the genome. For many bacterial genomes, the longest repeat is ~5–6 kbp (the rRNA operon), so a long read set with an N50 of 8 kbp or more will be sufficient. However, some bacterial genomes have much longer repeats (e.g. multiple copies of a prophage) necessitating much longer reads to get a complete assembly.

Linear sequences

Autocycler can struggle to fully resolve linear sequences for a few reasons:

  • Input contigs can extend past hairpin ends, leading to erratic contig lengths. The Autocycler trim step can often but not always repair this.
  • Input contigs can be inconsistent regarding where blunt ends terminate, leading to unresolved sequences in the Autocycler resolve step.

See the Linear sequences page for a more thorough description of these problems. Future versions of Autocycler will aim to improve behaviour on linear sequences.

Now what do I do?

If your Autocycler assembly went poorly due to low-quality input assemblies, you can try the following:

Clone this wiki locally