-
Notifications
You must be signed in to change notification settings - Fork 5
Manually curated assembly
This page follows the same steps as the Fully automated assembly, but it adds additional manual steps that allow the user to curate and examine the results.
# Set these variables as appropriate for your system and genome:
threads=16
genome_size="5500000"
autocycler subsample --reads ont.fastq.gz --out_dir subsampled_reads --genome_size "$genome_size"
mkdir assemblies
for i in 01 07 13 19; do
canu.sh subsampled_reads/sample_"$i".fastq assemblies/canu_"$i" "$threads" "$genome_size"
done
for i in 02 08 14 20; do
flye.sh subsampled_reads/sample_"$i".fastq assemblies/flye_"$i" "$threads"
done
for i in 03 09 15 21; do
miniasm.sh subsampled_reads/sample_"$i".fastq assemblies/miniasm_"$i" "$threads"
done
for i in 04 10 16 22; do
necat.sh subsampled_reads/sample_"$i".fastq assemblies/necat_"$i" "$threads" "$genome_size"
done
for i in 05 11 17 23; do
nextdenovo.sh subsampled_reads/sample_"$i".fastq assemblies/nextdenovo_"$i" "$threads" "$genome_size"
done
for i in 06 12 18 24; do
raven.sh subsampled_reads/sample_"$i".fastq assemblies/raven_"$i" "$threads"
done
# Optional step: remove the subsampled reads to save space
rm -r subsampled_reads
At this stage, you can inspect each input assembly and decide whether you want to delete or modify it before continuing with Autocycler. See this page for more details.
autocycler compress -i assemblies -a autocycler
autocycler cluster -a autocycler
At this stage, you can inspect the clustering and, if desired, modify it before continuing with Autocycler. See this page for more details.
for c in autocycler/clustering/qc_pass/cluster_*; do
autocycler trim -c "$c"
if [[ $(wc -c <"$c"/1_untrimmed.gfa) -lt 1000000 ]]; then
autocycler dotplot -i "$c"/1_untrimmed.gfa -o "$c"/1_untrimmed.png
autocycler dotplot -i "$c"/2_trimmed.gfa -o "$c"/2_trimmed.png
fi
autocycler resolve -c "$c"
done
The above loop also runs Autocycler dotplot on each cluster which is less than ~1 Mbp in size, for both the untrimmed and trimmed sequences. This size limit is because Autocycler dotplot is fast to run on small sequences (e.g. plasmids) but can take a while to finish for longer sequences (e.g. chromosomes).
autocycler combine -a autocycler -i autocycler/clustering/qc_pass/cluster_*/5_final.gfa
- Step 1: Autocycler subsample
- Step 2: Generating input assemblies
- Step 3: Autocycler compress
- Step 4: Autocycler cluster
- Step 5: Autocycler trim
- Step 6: Autocycler resolve
- Step 7: Autocycler combine