-
Notifications
You must be signed in to change notification settings - Fork 5
Why didn't my assembly go well?
There are two common reasons for Autocycler to fail to produce a completely resolved assembly:
- The input assemblies were low quality.
- The genome contains one or more linear sequences.
In order to generate a complete and clean consensus assembly, Autocycler requires that most input assemblies are complete – each sequence in the genome assembled to a single contig. If this is not the case, Autocycler will not run well. For example, if your bacterial genome has a 5 Mbp chromosome but each input assembly has the chromosome fragmented into two pieces, 2 Mbp and 3 Mbp, then Autocycler will not be able to create a complete 5 Mbp consensus sequence for the chromosome.
The most common reason for low-quality input assemblies is an insufficient long read set: either the depth is too low or the reads are too short. Ideally, the depth will be 100× or more, but sometimes good assemblies can be made with <50× depth. For read length, it is important that there are plenty of reads longer than the longest repeat in the genome. For many bacterial genomes, the longest repeat is ~5–6 kbp (the rRNA operon), so a long read set with an N50 of 8 kbp or more will be sufficient. However, some bacterial genomes have much longer repeats (e.g. multiple copies of a prophage) necessitating much longer reads to get a complete assembly.
Autocycler can struggle to fully resolve linear sequences for a few reasons:
- Input contigs can extend past hairpin ends, leading to erratic contig lengths. The Autocycler trim step can often but not always repair this.
- Input contigs can be inconsistent regarding where blunt ends terminate, leading to unresolved sequences in the Autocycler resolve step.
See the Linear sequences page for a more thorough description of these problems. Future versions of Autocycler will aim to improve behaviour on linear sequences.
If your Autocycler assembly went poorly due to low-quality input assemblies, you can try the following:
- Try different methods for generating input assemblies. Different assemblers and parameters may work better for your genome.
- Step 1: Autocycler subsample
- Step 2: Generating input assemblies
- Step 3: Autocycler compress
- Step 4: Autocycler cluster
- Step 5: Autocycler trim
- Step 6: Autocycler resolve
- Step 7: Autocycler combine