-
Notifications
You must be signed in to change notification settings - Fork 5
FAQ and miscellaneous tips
- Does Autocycler include an assembly pipeline?
- Polishing after Autocycler
- How does Autocycler compare to Trycycler?
- Can Autocycler be used on eukaryote genomes?
- Can Autocycler be used on mitochondrial/chloroplast genomes?
- Is Autocycler deterministic?
- Does Autocycler rotate circular sequences to a consistent position?
- How does Autocycler colour sequences in its graphs?
- Suppressing terminal colours
No, Autocycler does not include an assembly pipeline. The documentation provides example commands to run in Bash, but further optimisation is left to the user. The ideal approach depends on your computational environment and requirements. For example, some users may need to use job schedulers like SLURM, while others might use workflow managers such as Nextflow or Snakemake.
If you are optimising Autocycler assemblies or creating a pipeline, here are some things to keep in mind:
- The most time-consuming steps in an Autocycler workflow are typically the creation of input assemblies, which can be carried out in parallel.
- The choice of input assemblers is up to you. Keep in mind the speed-vs-quality tradeoff: some tools (e.g. Canu) are slow but produce high-quality assemblies, while others (e.g. Raven) are faster but tend to introduce more errors.
- I recommend using a variety of input assemblers. For example, even if Flye is your favourite assembler, a Flye-only pipeline is more prone to errors than one that also includes others assemblers.
- If your pipeline handles multiple isolates, I suggest running Autocycler table at the end to generate a summary table indicating how well each assembly performed.
Since Autocycler assemblies are long-read-only, they may still contain small-scale errors produced by systematic errors in the reads. A common example would be long homopolymers: if a genome contains A×15
at a locus but most of the long reads erroneously have A×14
at that locus, the assembly is likely to contain the A×14
error.
If you are assembling Oxford Nanopore reads, performing long-read polishing with Medaka can help. I recommend using the --bacteria
option for the latest methylation-aware model. If you also have short reads, then I recommend using Polypolish and Pypolca, both of which are conservative (unlikely to introduce new errors).
Further information about polishing is available in these papers:
- Wick RR, Judd LM, Holt KE. Assembling the perfect bacterial genome using Oxford Nanopore and Illumina sequencing. PLOS Computational Biology. 2023. doi:10.1371/journal.pcbi.1010905. and its online tutorial
- Bouras G, Judd LM, Edwards RA, Vreugde S, Stinear TP, Wick RR. How low can you go? Short-read polishing of Oxford Nanopore bacterial genome assemblies. Microbial Genomics. 2024. doi:10.1099/mgen.0.001254.
Autocycler was designed to be a faster and automated successor to Trycycler. Both tools perform the same task: combining multiple alternative assemblies of the same genome into a clean consensus assembly.
- Trycycler was designed to be human-guided.
- Autocycler was designed to be automated (but still allow for human intervention).
- They should give very similar results, but Autocycler is probably easier to use.
- Performance:
- Trycycler is slower. Written in Python, and it has a few steps which involve aligning all the reads, which is slow.
- Autocycler is faster. Written in Rust and doesn't align reads.
Most users will probably be better off with Autocycler!
For Autocycler to work, the input assemblies need to mostly be complete: one sequence per piece of DNA in the genome. So if T2T assemblies are possible, then Autocycler should work!
However, phased diploid assemblies might create a problem in the clustering step. If there is a lot of heterozygosity, the two haplotypes for each chromosome might separate in the UPGMA tree, in which case clustering might work (potentially requiring manual specification of the clusters). But more likely it will require the user to separate the haplotypes in the input assemblies. I.e. split each phased input assembly into a maternal assembly and paternal assembly, then run Autocycler twice (once for maternal, once for paternal).
Yes! Since these genomes are circular and descended from bacterial genomes, they are well suited to an Autocycler assembly. I recommend first extracting just the mitochondrial/chloroplast reads from your long read set so you can produce input assemblies without the nuclear genome.
Yes, Autocycler itself is deterministic: for a given set of input assemblies and parameters, it will produce the same consensus assembly. However, not all assemblers are deterministic, so a full Autocycler assembly (including the generation of input assemblies) may differ from run to run.
No, unlike Trycycler, Autocycler does not rotate circular sequences to start at a particular gene (e.g. dnaA). I may add this feature in the future, but for now I recommend using Dnaapler.
For some of the GFA files it creates, Autocycler adds colours to the segments using the CL:z:
tag that Bandage can read:
- For the
3_bridged.gfa
graph, anchors are coloured green and bridges are coloured pink (the same scheme used by Unicycler. - For the
4_merged.gfa
and5_final.gfa
graphs, consentigs (sequences created by merging unitigs together) are coloured blue. - For the final consensus graph made by Autocycler combine, consentigs are again coloured blue, and anything else is a bright orange-red to indicate that the assembly is not complete.
Autocycler uses some ANSI colours in its terminal output to stderr for aesthetic purposes. If you would rather no colours (e.g. when redirecting stderr to a log file), you can set the NO_COLOR
environment variable before running Autocycler:
export NO_COLOR=1
- Step 1: Autocycler subsample
- Step 2: Generating input assemblies
- Step 3: Autocycler compress
- Step 4: Autocycler cluster
- Step 5: Autocycler trim
- Step 6: Autocycler resolve
- Step 7: Autocycler combine