-
Notifications
You must be signed in to change notification settings - Fork 5
Manually curated assembly
Ryan Wick edited this page Sep 4, 2024
·
29 revisions
This page follows the same basic process as the [Fully automated assembly, but it adds additional manual steps that give the user the opportunity to curate and examine the results.
# Set these variables as appropriate for your system and genome:
threads=16
genome_size="5500000"
# Subsample the long-read set into multiple files:
autocycler subsample --reads ont.fastq --out_dir subsampled_reads --genome_size "$genome_size"
# Assemble each subsampled file:
mkdir assemblies
for i in 01 05 09 13 17 21; do
canu -p canu -d canu_temp_"$i" -fast genomeSize="$genome_size" useGrid=false maxThreads="$threads" -nanopore subsampled_reads/sample_"$i".fastq
canu_trim.py canu_temp_"$i"/canu.contigs.fasta > assemblies/canu_"$i".fasta
rm -rf canu_temp_"$i"
done
for i in 02 06 10 14 18 22; do
flye --nano-hq subsampled_reads/sample_"$i".fastq --threads "$threads" --out-dir flye_temp_"$i"
cp flye_temp_"$i"/assembly.fasta assemblies/flye_"$i".fasta
cp flye_temp_"$i"/assembly_graph.gfa assemblies/flye_"$i".gfa
rm -r flye_temp_"$i"
done
for i in 03 07 11 15 19 23; do
miniasm_and_minipolish.sh subsampled_reads/sample_"$i".fastq "$threads" > assemblies/miniasm_"$i".gfa
any2fasta assemblies/miniasm_"$i".gfa > assemblies/miniasm_"$i".fasta
done
for i in 04 08 12 16 20 24; do
raven --threads "$threads" --disable-checkpoints --graphical-fragment-assembly assemblies/raven_"$i".gfa subsampled_reads/sample_"$i".fastq > assemblies/raven_"$i".fasta
done
autocycler compress -i assemblies -a autocycler
autocycler cluster -a autocycler
for c in autocycler/clustering/qc_pass/cluster_*; do
autocycler trim -c "$c"
if [[ $(wc -c <"$c"/1_untrimmed.gfa) -lt 1000000 ]]; then
autocycler dotplot -i "$c"/1_untrimmed.gfa -o "$c"/1_untrimmed.png
autocycler dotplot -i "$c"/2_trimmed.gfa -o "$c"/2_trimmed.png
fi
autocycler resolve -c "$c"
done
The above loop runs Autocycler dotplot on each cluster which is less than ~1 Mbp in size, for both the untrimmed and trimmed sequences. This size limit is because Autocycler dotplot is fast to run on small sequences (e.g. plasmids) but can take a while to finish for longer sequences (e.g. chromosomes).
autocycler combine -a autocycler -i autocycler/clustering/qc_pass/cluster_*/5_final.gfa
- Step 1: Autocycler subsample
- Step 2: Generating input assemblies
- Step 3: Autocycler compress
- Step 4: Autocycler cluster
- Step 5: Autocycler trim
- Step 6: Autocycler resolve
- Step 7: Autocycler combine