-
Notifications
You must be signed in to change notification settings - Fork 20
Tutorial (medium)
Welcome to the MEDIUM version of the tutorial. Here you will be given:
- Moderately detailed instructions on what to do.
- Goals for each step in the process.
- Expected results after each step.
- Tips and guidelines along the way.
If you haven't already, download the sample data hybrid Illumina+ONT read set to assemble:
- Paired-end Illumina reads in FASTQ format:
-
S_aureus_JKD6159_Illumina_1.fastq.gz
: 3.4 million reads, 499 Mbp -
S_aureus_JKD6159_Illumina_2.fastq.gz
: 3.4 million reads, 499 Mbp
-
- Basecalled ONT R10.4 reads in FASTQ format:
-
S_aureus_JKD6159_ONT_R10.4_guppy_v6.1.7.fastq.gz
: 1.8 million reads, 5.6 Gbp
-
The goal of read QC is to discard low-quality reads and/or trim off low-quality regions of reads. This will make them easier to use in later steps (assembly and polishing).
For Illumina read QC, use fastp to remove adapters and trim off low-quality bases. Its default settings work well, so you just need to give it input and output files. Note that some paired-end reads become orphaned during QC, i.e. their corresponding read is discard so they are no longer part of a pair. This shouldn't be very many of these, so I like to save the orphaned reads into a file, confirm that it's a small proportion of the reads, then discard them.
This ONT read set has a poor N50 (4.2 kbp). Throwing out shorter reads will improve the N50 at the cost of depth, but since this read set is so deep (5.6 Gbp), that trade-off is worth it. Run Filtlong with --min_length 6000
to discard reads less than 6 kbp in length. You can then run Filtlong again with --keep_percent 90
to throw out the worst 10% of reads. After these QC steps, you should be left with an ONT read set with a much better N50 (15 kbp) but still plenty of depth (1.8 Gbp).
At this point you should have post-QC Illumina reads (in two FASTQ files) and post-QC ONT reads (in one FASTQ file).