Skip to content

FAQ and miscellaneous tips

Ryan Wick edited this page Nov 26, 2024 · 16 revisions

Table of contents

Does Autocycler include an assembly pipeline?

No, Autocycler does not include an assembly pipeline. The documentation provides example commands to run in Bash, but further optimisation is left to the user. The ideal approach depends on your computational environment and requirements. For example, some users may need to use job schedulers like SLURM, while others might use workflow managers such as Nextflow or Snakemake.

If you are optimising Autocycler assemblies or creating a pipeline, here are some things to keep in mind:

  • The most time-consuming steps in an Autocycler workflow are typically the creation of input assemblies, which can be carried out in parallel.
  • The choice of input assemblers is up to you. Keep in mind the speed-vs-quality tradeoff: some tools (e.g. Canu) are slow but produce high-quality assemblies, while others (e.g. Raven) are faster but tend to introduce more errors.
  • I recommend using a variety of input assemblers. For example, even if Flye is your favourite assembler, a Flye-only pipeline is more prone to errors than one that also includes others assemblers.
  • If your pipeline handles multiple isolates, I suggest running Autocycler table at the end to generate a summary table indicating how well each assembly performed.

Polishing after Autocycler

Since Autocycler assemblies are long-read-only, they may still contain small-scale errors produced by systematic errors in the reads. A common example would be long homopolymers: if a genome contains A×15 at a locus but most of the long reads erroneously have A×14 at that locus, the assembly is likely to contain the A×14 error.

If you are assembling Oxford Nanopore reads, performing long-read polishing with Medaka can help. I recommend using the --bacteria option for the latest methylation-aware model. If you also have short reads, then I recommend using Polypolish and Pypolca, both of which are conservative (unlikely to introduce new errors).

Further information about polishing is available in these papers:

How does Autocycler compare to Trycycler?

Autocycler was designed to be a faster and automated successor to Trycycler. Both tools perform the same task: combining multiple alternative assemblies of the same genome into a clean consensus assembly.

  • Trycycler was designed to be human-guided.
  • Autocycler was designed to be automated (but still allow for human intervention).
  • They should give very similar results, but Autocycler is probably easier to use.
  • Performance:
    • Trycycler is slower. Written in Python, and it has a few steps which involve aligning all the reads, which is slow.
    • Autocycler is faster. Written in Rust and doesn't align reads.

Most users will probably be better off with Autocycler!

Can Autocycler be used on eukaryote genomes?

For Autocycler to work, the input assemblies need to mostly be complete: one sequence per piece of DNA in the genome. So if T2T assemblies are possible, then Autocycler should work!

However, phased diploid assemblies might create a problem in the clustering step. If there is a lot of heterozygosity, the two haplotypes for each chromosome might separate in the UPGMA tree, in which case clustering might work (potentially requiring manual specification of the clusters). But more likely it will require the user to separate the haplotypes in the input assemblies. I.e. split each phased input assembly into a maternal assembly and paternal assembly, then run Autocycler twice (once for maternal, once for paternal).

Can Autocycler be used on mitochondrial/chloroplast genomes?

Yes! Since these genomes are circular and descended from bacterial genomes, they are well suited to an Autocycler assembly. I recommend first extracting just the mitochondrial/chloroplast reads from your long read set so you can produce input assemblies without the nuclear genome.

Is Autocycler deterministic?

Yes, Autocycler itself is deterministic: for a given set of input assemblies and parameters, it will produce the same consensus assembly. However, not all assemblers are deterministic, so a full Autocycler assembly (including the generation of input assemblies) may differ from run to run.

Does Autocycler rotate circular sequences to a consistent position?

No, unlike Trycycler, Autocycler does not rotate circular sequences to start at a particular gene (e.g. dnaA). I may add this feature in the future, but for now I recommend using Dnaapler.

How does Autocycler colour sequences in its graphs?

For some of the GFA files it creates, Autocycler adds colours to the segments using the CL:z: tag that Bandage can read:

  • For the 3_bridged.gfa graph, anchors are coloured green and bridges are coloured pink (the same scheme used by Unicycler.
  • For the 4_merged.gfa and 5_final.gfa graphs, consentigs (sequences created by merging unitigs together) are coloured blue.
  • For the final consensus graph made by Autocycler combine, consentigs are again coloured blue, and anything else is a bright orange-red to indicate that the assembly is not complete.

Suppressing terminal colours

Autocycler uses some ANSI colours in its terminal output to stderr for aesthetic purposes. If you would rather no colours (e.g. when redirecting stderr to a log file), you can set the NO_COLOR environment variable before running Autocycler:

export NO_COLOR=1
Clone this wiki locally