-
Notifications
You must be signed in to change notification settings - Fork 5
Manually curated assembly
This page follows the same steps as the Fully Automated Assembly page but adds additional manual steps that allow the user to curate and examine the results. These optional steps let you inspect intermediate outputs and make adjustments, ensuring that the final consensus assembly is as accurate as possible.
reads=ont.fastq.gz # your read set goes here
threads=16 # set as appropriate for your system
genome_size=$(genome_size_raven.sh "$reads" "$threads") # can set this manually if you know the value
autocycler subsample --reads "$reads" --out_dir subsampled_reads --genome_size "$genome_size"
mkdir assemblies
for assembler in canu flye miniasm necat nextdenovo raven; do
for i in 01 02 03 04; do
"$assembler".sh subsampled_reads/sample_"$i".fastq assemblies/"$assembler"_"$i" "$threads" "$genome_size"
done
done
# Optional step: remove the subsampled reads to save space
rm subsampled_reads/*.fastq
At this stage, you can inspect each input assembly and decide whether you want to delete or modify it before continuing with Autocycler. See the Generating input assemblies page for more details.
autocycler compress -i assemblies -a autocycler_out
autocycler cluster -a autocycler_out
At this stage, you can inspect the clustering and, if desired, modify it before continuing with Autocycler. See the Autocycler cluster page for more details.
for c in autocycler_out/clustering/qc_pass/cluster_*; do
autocycler trim -c "$c"
if [[ $(wc -c <"$c"/1_untrimmed.gfa) -lt 1000000 ]]; then
autocycler dotplot -i "$c"/1_untrimmed.gfa -o "$c"/1_untrimmed.png
autocycler dotplot -i "$c"/2_trimmed.gfa -o "$c"/2_trimmed.png
fi
autocycler resolve -c "$c"
done
The above loop also runs Autocycler dotplot clusters less than ~1 Mbp in size, for both the untrimmed and trimmed sequences. This size limit is because Autocycler dotplot is fast to run on small sequences (e.g. plasmids) but can take a while to finish for longer sequences (e.g. chromosomes).
After trimming, you can visually inspect each cluster's dotplots, which can show the effects of trimming and reveal potential structural issues. See the Autocycler dotplot page for more information.
In this step, you can review how Autocycler has bridged the sequences to form a consensus. This can be useful for identifying regions where sequence ambiguity remains. In particular, it can be helpful to examine each cluster's 4_merged.gfa
file to see if there is structural heterogeneity or conflicts between assemblies, which may suggest areas to review or adjust manually.
autocycler combine -a autocycler_out -i autocycler_out/clustering/qc_pass/cluster_*/5_final.gfa
The final consensus assembly will be saved as autocycler_out/consensus_assembly.fasta
.
If the consensus assembly is not fully resolved, viewing the assembly graph (consensus_assembly.gfa
) in Bandage can reveal any problematic parts of the assembly. It may then be possible to use Autocycler clean to remove unwanted tigs to allow for a fully resolved assembly.
- Step 1: Autocycler subsample
- Step 2: Generating input assemblies
- Step 3: Autocycler compress
- Step 4: Autocycler cluster
- Step 5: Autocycler trim
- Step 6: Autocycler resolve
- Step 7: Autocycler combine