Skip to content

Manually curated assembly

Ryan Wick edited this page Oct 30, 2024 · 29 revisions

This page follows the same steps as the Fully automated assembly, but it adds additional manual steps that allow the user to curate and examine the results.

Steps 1 and 2: subsample reads and generate input assemblies

# Set these variables as appropriate for your system and genome:
threads=16
genome_size="5500000"

autocycler subsample --reads ont.fastq.gz --out_dir subsampled_reads --genome_size "$genome_size"

mkdir assemblies
for i in 01 07 13 19; do
    canu.sh subsampled_reads/sample_"$i".fastq assemblies/canu_"$i" "$threads" "$genome_size"
done
for i in 02 08 14 20; do
    flye.sh subsampled_reads/sample_"$i".fastq assemblies/flye_"$i" "$threads"
done
for i in 03 09 15 21; do
    miniasm.sh subsampled_reads/sample_"$i".fastq assemblies/miniasm_"$i" "$threads"
done
for i in 04 10 16 22; do
    necat.sh subsampled_reads/sample_"$i".fastq assemblies/necat_"$i" "$threads" "$genome_size"
done
for i in 05 11 17 23; do
    nextdenovo.sh subsampled_reads/sample_"$i".fastq assemblies/nextdenovo_"$i" "$threads" "$genome_size"
done
for i in 06 12 18 24; do
    raven.sh subsampled_reads/sample_"$i".fastq assemblies/raven_"$i" "$threads"
done

# Optional step: remove the subsampled reads to save space
rm -r subsampled_reads

Manual step: curate input assemblies

At this stage, you can inspect each input assembly and decide whether you want to delete or modify it before continuing with Autocycler. See this page for more details.

Steps 3 and 4: compress and cluster input assemblies

autocycler compress -i assemblies -a autocycler
autocycler cluster -a autocycler

Manual step: curate clusters

At this stage, you can inspect the clustering and, if desired, modify it before continuing with Autocycler. See this page for more details.

Steps 5 and 6: trim and resolve each QC-pass cluster

for c in autocycler/clustering/qc_pass/cluster_*; do
    autocycler trim -c "$c"
    if [[ $(wc -c <"$c"/1_untrimmed.gfa) -lt 1000000 ]]; then
        autocycler dotplot -i "$c"/1_untrimmed.gfa -o "$c"/1_untrimmed.png
        autocycler dotplot -i "$c"/2_trimmed.gfa -o "$c"/2_trimmed.png
    fi
    autocycler resolve -c "$c"
done

The above loop also runs Autocycler dotplot on each cluster which is less than ~1 Mbp in size, for both the untrimmed and trimmed sequences. This size limit is because Autocycler dotplot is fast to run on small sequences (e.g. plasmids) but can take a while to finish for longer sequences (e.g. chromosomes).

Manual step: examine dotplots

Manual step: examine Autocycler bridging

Step 7: combine resolved clusters into a final assembly

autocycler combine -a autocycler -i autocycler/clustering/qc_pass/cluster_*/5_final.gfa
Clone this wiki locally