-
Notifications
You must be signed in to change notification settings - Fork 20
Requirements
This tutorial was written with the assumption that you have basic Linux/Unix/Mac CLI skills: you should be able to navigate directories, run tools, use the pipe, etc. It also assumes familiarity with bioinformatics file formats like FASTA and FASTQ.
Here's a one-liner I made up as a skill test:
cat *.fasta | grep '>' | sort > header_lines
That command takes the contents of a bunch of FASTA files (cat *.fasta
), filters for the header lines (grep '>'
), alphabetises the results (sort
) and puts them in a file called header_lines
.
Can you follow the logic and understand that command? If so, you should be good to go! However, if that command looks like an incomprehensible foreign language, then you might find this tutorial difficult.
You'll need a good hybrid read set of both Illumina and ONT reads to assemble. Visit the Sample data page to download suitable S. aureus data. The easy and medium versions of the tutorial assume that you are using the R10.4 and Illumina reads from this sample data set. The hard version is more general and can be done with any good read set.
What do I mean by 'good' and how good must your reads be? In order to get a perfect bacterial genome assembly, your reads should be deep: ideally 200× or more for both Illumina and ONT. And your ONT reads should be long (ideally an N50 of 15 kbp or more). If your data doesn't meet this high standard, you can certainly still follow the tutorial, but it might be harder, and you might not be able to assemble your genome to zero-error perfection.
You'll need a lot of command-line tools installed to do this tutorial, and as every bioinformatician knows, installing software can often be the hardest part! Here's a list of what you'll need, but I won't provide detailed installation instructions – you'll need to consult each tool's documentation. A lot of these are available on Bioconda, which can make installation much easier.
- Read alignment: minimap2, BWA and Samtools
- Read QC: Filtlong and fastp
- Consensus long-read assembler: Trycycler.
- Trycycler has a few other software requirements, see the software requirements page of its documentation.
- Required long-read assemblers: Flye, Raven and miniasm/Minipolish
- Optional long-read assemblers: Canu, NECAT, NextDenovo/NextPolish, Redbean and Shasta
- Long-read polisher: Medaka
- Required short-read polishers: Polypolish, POLCA and ropebwt2/FMLRC2
- Optional short-read polishers: ntEdit, HyPo and NextPolish
- Sequence file manipulation: seqtk
- Alignment visualisation: IGV
- A text editor that can handle large (multi-megabyte) files:
- Sublime Text and Atom are good choices if you want a GUI editor.
- Command-line editors like Vim and Emacs are also appropriate.
- A phylogeny-viewer, such as FigTree
- Optional tools for reference-free assembly assessment: ALE and Prodigal