Skip to content

Demo dataset

Ryan Wick edited this page Jan 6, 2025 · 17 revisions

This demo dataset is a small 'genome' consisting of some E. coli plasmids. By excluding the chromosome, the file sizes are kept smaller, making this demo faster to download and assemble. This dataset provides a practical way to test Autocycler's workflow and become familiar with its commands.

Download the Demo Dataset

You can download the demo dataset from here: autocycler-demo-dataset.tar

The autocycler_demo_dataset.tar file contains the following:

  • reads.fastq.gz: 75 Mbp of ONT reads
  • truth.fasta: an error-free reference

Running Autocycler on the Demo Dataset

The following commands will guide you through running a fully automated assembly on the demo dataset. These commands use only three different assemblers to minimise processing time.

threads="16"
genome_size="242000"

autocycler subsample --reads reads.fastq.gz --out_dir subsampled_reads --genome_size "$genome_size"

mkdir assemblies
for assembler in flye miniasm raven; do
    for i in 01 02 03 04; do
        "$assembler".sh subsampled_reads/sample_"$i".fastq assemblies/"$assembler"_"$i" "$threads" "$genome_size"
    done
done
rm subsampled_reads/*.fastq

autocycler compress -i assemblies -a autocycler_out

autocycler cluster -a autocycler_out

for c in autocycler_out/clustering/qc_pass/cluster_*; do
    autocycler trim -c "$c"
    autocycler resolve -c "$c"
done

autocycler combine -a autocycler_out -i autocycler_out/clustering/qc_pass/cluster_*/5_final.gfa

The final consensus assembly will be saved as autocycler/consensus_assembly.fasta. This assembly should closely (ideally exactly) match truth.fasta, but since the plasmids are circular, the sequences will probably differ in strand and starting position.

Other demo datasets

You can also try running Autocycler on the Trycycler demo datasets which contain pre-made assemblies. These are a little bit dated (the assemblies have a higher error rate with lots of homopolymer-length errors) but will still work with Autocycler. The 'great', 'good' and 'mediocre' datasets should yield a structurally correct assembly.

Clone this wiki locally