Major drop quality score with --trim adapters #53

felipebatalini · 2024-11-13T21:26:00Z

Why are my q scores dropping so much with --trim adapters?

We are using FLO-MIN114 and R10 chemistry for a cDNA library derived from human RNA.
We noticed a high percentage (>50%) of unusable reads detected by pychopper with the wf-transcriptome workflow, and then identified that it could be improved to <10% if we trimmed the adapters (therefore keeping primers).

However, I was surprised to see a significant drop in the quality scores when I turn on --trim adapters:
nextflow run epi2me-labs/wf-basecalling \ -profile singularity \ --sample_name $sample_name \ --input $pod5_dir \ --dorado_ext pod5 \ --basecaller_cfg [email protected] \ --qscore_filter 10 \ --basecaller_args "--trim adapters" \ --output_fmt fastq \ --out_dir $results_folder
While it makes sense for pychopper to work better with the primers present, I can't understand while the basecalling quality drops do much. In the example below, I demonstrate the different q scores from the same sample.

I appreciate any help to understand this!
Felipe

The text was updated successfully, but these errors were encountered:

cjw85 · 2024-11-21T22:24:06Z

This is due to the fact that the phred scores for bases in adapter, barcode, and primer regions are typically surpressed compared to bases further into reads. The default in dorado trims all of these components and so when the read quality score is computer by the workflow from the remaining bases the value is higher than when --trim adapters is enabled and barcode and primer sequences are left in place.

Dorado itself reports read quality scores having dropped the first 60 quality scores (see e.g. CRFModelConfig.cpp#L41). The workflow component that is responsible for the data behind these graphs does not do this as it does not have knowledge of whether the basecall has been pretrimmed of adapters, barcodes, and primers.

felipebatalini · 2024-11-23T00:24:38Z

@cjw85 Thanks for your answer. So, in terms of overall assessment of our sequencing quality, it seems that the top graph would be more representative of the true quality of basecalling. And that the drop in Q score is an artifact caused by the adapters or primers (in this case) as opposed to poor sequencing quality. We are not multiplexing, so no barcodes. Would you agree with this assessment?

felipebatalini added the question Further information is requested label Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major drop quality score with --trim adapters #53

Major drop quality score with --trim adapters #53

felipebatalini commented Nov 13, 2024 •

edited

Loading

cjw85 commented Nov 21, 2024

felipebatalini commented Nov 23, 2024

Major drop quality score with --trim adapters #53

Major drop quality score with --trim adapters #53

Comments

felipebatalini commented Nov 13, 2024 • edited Loading

Why are my q scores dropping so much with --trim adapters?

cjw85 commented Nov 21, 2024

felipebatalini commented Nov 23, 2024

felipebatalini commented Nov 13, 2024 •

edited

Loading