Requirements

Skills

This tutorial was written with the assumption that you have basic Linux/Unix/Mac CLI skills: you should be able to navigate directories, run tools, use the pipe, etc. It also assumes familiarity with bioinformatics file formats like FASTA and FASTQ.

Here's a one-liner I made up as a skill test:

cat *.fasta | grep '>' | sort > header_lines

That command takes the contents of a bunch of FASTA files (cat *.fasta), filters for the header lines (grep '>'), alphabetises the results (sort) and puts them in a file called header_lines.

Can you follow the logic and understand that command? If so, you should be good to go! However, if that command looks like an incomprehensible foreign language, then you might find this tutorial difficult.

Data

You'll need a good hybrid read set of both Illumina and ONT reads to assemble. Visit the Sample data page to download suitable S. aureus data. The easy and medium versions of the tutorial assume that you are using the R10.4 and Illumina reads from this sample data set. The hard version is more general and can be done with any good read set.

What do I mean by 'good' and how good must your reads be? In order to get a perfect bacterial genome assembly, your reads should be deep: ideally 200× or more for both Illumina and ONT. And your ONT reads should be long (ideally an N50 of 15 kbp or more). If your data doesn't meet this high standard, you can certainly still follow the tutorial, but it might be harder, and you might not be able to assemble your genome to zero-error perfection.

Software

You'll need a lot of command-line tools installed to do this tutorial, and as every bioinformatician knows, installing software can often be the hardest part! Here's a list of what you'll need, but I won't provide detailed installation instructions – you'll need to consult each tool's documentation. A lot of these are available on Bioconda, which can make installation much easier.

Read alignment: minimap2, BWA and Samtools
Read QC: Filtlong and fastp
Consensus long-read assembler: Trycycler.
- Trycycler has a few other software requirements, see the software requirements page of its documentation.
Required long-read assemblers: Flye, Raven and miniasm/Minipolish
Optional long-read assemblers: Canu, NECAT, NextDenovo/NextPolish, Redbean and Shasta
Long-read polisher: Medaka
Required short-read polishers: Polypolish, POLCA and ropebwt2/FMLRC2
Optional short-read polishers: ntEdit, HyPo and NextPolish
Sequence file manipulation: seqtk
Alignment visualisation: IGV
A text editor that can handle large (multi-megabyte) files:
- Sublime Text and Atom are good choices if you want a GUI editor.
- Command-line editors like Vim and Emacs are also appropriate.
A phylogeny-viewer, such as FigTree
Optional tools for reference-free assembly assessment: ALE and Prodigal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requirements

Skills

Data

Software

Clone this wiki locally