Skip to content

Manually curated assembly

Ryan Wick edited this page Sep 4, 2024 · 29 revisions

This page follows the same basic process as the [Fully automated assembly, but it adds additional manual steps that give the user the opportunity to curate and examine the results.

Automated step: subsample reads and generate input assemblies

# Set these variables as appropriate for your system and genome:
threads=16
genome_size="5500000"

# Subsample the long-read set into multiple files:
autocycler subsample --reads ont.fastq --out_dir subsampled_reads --genome_size "$genome_size"

# Assemble each subsampled file:
mkdir assemblies
for i in 01 05 09 13 17 21; do
    canu -p canu -d canu_temp_"$i" -fast genomeSize="$genome_size" useGrid=false maxThreads="$threads" -nanopore subsampled_reads/sample_"$i".fastq
    canu_trim.py canu_temp_"$i"/canu.contigs.fasta > assemblies/canu_"$i".fasta
    rm -rf canu_temp_"$i"
done
for i in 02 06 10 14 18 22; do
    flye --nano-hq subsampled_reads/sample_"$i".fastq --threads "$threads" --out-dir flye_temp_"$i"
    cp flye_temp_"$i"/assembly.fasta assemblies/flye_"$i".fasta
    cp flye_temp_"$i"/assembly_graph.gfa assemblies/flye_"$i".gfa
    rm -r flye_temp_"$i"
done
for i in 03 07 11 15 19 23; do
    miniasm_and_minipolish.sh subsampled_reads/sample_"$i".fastq "$threads" > assemblies/miniasm_"$i".gfa
    any2fasta assemblies/miniasm_"$i".gfa > assemblies/miniasm_"$i".fasta
done
for i in 04 08 12 16 20 24; do
    raven --threads "$threads" --disable-checkpoints --graphical-fragment-assembly assemblies/raven_"$i".gfa subsampled_reads/sample_"$i".fastq > assemblies/raven_"$i".fasta
done

Manual step: curate input assemblies

Automated step: compress and cluster input assemblies

autocycler compress -i assemblies -a autocycler
autocycler cluster -a autocycler

Manual step: curate clusters

Automated step: trim and resolve each QC-pass cluster

for c in autocycler/clustering/qc_pass/cluster_*; do
    autocycler trim -c "$c"
    autocycler resolve -c "$c"
done

Manual step: examine dot plots

Manual step: examine 4_merged.gfa files

Automated step: combine resolved clusters into a final assembly

autocycler combine -a autocycler -i autocycler/clustering/qc_pass/cluster_*/5_final.gfa
Clone this wiki locally