-
Notifications
You must be signed in to change notification settings - Fork 5
Manually curated assembly
This page follows the same steps as the Fully Automated Assembly page but adds additional manual steps that allow the user to curate and examine the results. These optional steps let you inspect intermediate outputs and make adjustments, ensuring that the final consensus assembly is as accurate as possible.
threads=16 # set as appropriate for your system
genome_size=$(genome_size_raven.sh ont.fastq.gz "$threads")
autocycler subsample --reads ont.fastq.gz --out_dir subsampled_reads --genome_size "$genome_size"
mkdir assemblies
for i in 01 07 13 19; do
canu.sh subsampled_reads/sample_"$i".fastq assemblies/canu_"$i" "$threads" "$genome_size"
done
for i in 02 08 14 20; do
flye.sh subsampled_reads/sample_"$i".fastq assemblies/flye_"$i" "$threads"
done
for i in 03 09 15 21; do
miniasm.sh subsampled_reads/sample_"$i".fastq assemblies/miniasm_"$i" "$threads"
done
for i in 04 10 16 22; do
necat.sh subsampled_reads/sample_"$i".fastq assemblies/necat_"$i" "$threads" "$genome_size"
done
for i in 05 11 17 23; do
nextdenovo.sh subsampled_reads/sample_"$i".fastq assemblies/nextdenovo_"$i" "$threads" "$genome_size"
done
for i in 06 12 18 24; do
raven.sh subsampled_reads/sample_"$i".fastq assemblies/raven_"$i" "$threads"
done
# Optional step: remove the subsampled reads to save space
rm -r subsampled_reads
At this stage, you can inspect each input assembly and decide whether you want to delete or modify it before continuing with Autocycler. See the Generating input assemblies page for more details.
autocycler compress -i assemblies -a autocycler
autocycler cluster -a autocycler
At this stage, you can inspect the clustering and, if desired, modify it before continuing with Autocycler. See the Autocycler cluster page for more details.
for c in autocycler/clustering/qc_pass/cluster_*; do
autocycler trim -c "$c"
if [[ $(wc -c <"$c"/1_untrimmed.gfa) -lt 1000000 ]]; then
autocycler dotplot -i "$c"/1_untrimmed.gfa -o "$c"/1_untrimmed.png
autocycler dotplot -i "$c"/2_trimmed.gfa -o "$c"/2_trimmed.png
fi
autocycler resolve -c "$c"
done
The above loop also runs Autocycler dotplot clusters less than ~1 Mbp in size, for both the untrimmed and trimmed sequences. This size limit is because Autocycler dotplot is fast to run on small sequences (e.g. plasmids) but can take a while to finish for longer sequences (e.g. chromosomes).
After trimming, you can visually inspect each cluster's dotplots, which can show the effects of trimming and reveal potential structural issues. See the Autocycler dotplot page for more information.
In this step, you can review how Autocycler has bridged the sequences to form a consensus. This can be useful for identifying regions where sequence ambiguity remains. In particular, it can be helpful to examine each cluster's 4_merged.gfa
file to see if there is structural heterogeneity or conflicts between assemblies, which may suggest areas to review or adjust manually.
autocycler combine -a autocycler -i autocycler/clustering/qc_pass/cluster_*/5_final.gfa
The final consensus assembly will be saved as autocycler/consensus_assembly.fasta
.
- Step 1: Autocycler subsample
- Step 2: Generating input assemblies
- Step 3: Autocycler compress
- Step 4: Autocycler cluster
- Step 5: Autocycler trim
- Step 6: Autocycler resolve
- Step 7: Autocycler combine