Performance drop during Read Processing, maxing out "output queue capacity basecalls" but not I/O #46

AAnnan · 2020-08-05T12:14:42Z

Hello,

I experienced a significant drop in read processing performance (from ~22 reads/s to 10 reads/s and falling) with Megalodon after ~580 000 reads processed (~7h) when the output queue capacity basecalls was maxed out (10000/10000). The output states that this is a sign of I/O bottleneck but upon checking monitoring tools like iostat or iotop, the state of the drive (2TB NVMe) looked fine and not at all fully used. What could be going wrong?

Guppy Basecall Server
Guppy Basecall Service Software, (C) Oxford Nanopore Technologies, Limited. Version 4.0.14+8d3226e, client-server API version 2.1.0

Megalodon
Megalodon version: 2.1.1

Megalodon command

megalodon ./final_fast5s/ --guppy-server-path ${GUPPY_DIR}/guppy_basecall_server \
        --guppy-params "-d ./rerio/basecall_models/" \
        --guppy-config res_dna_r941_min_modbases-all-context_v001.cfg \
        --outputs basecalls mod_basecalls mappings mods per_read_mods mod_mappings \
        --output-directory ./mega_results/ \
        --reference $genomeFile \
        --mod-motif Z GCG 1 --mod-motif Z HCG 1 --mod-motif Z GCH 1 \
        --write-mods-text \
        --mod-aggregate-method binary_threshold \
        --mod-binary-threshold 0.875 \
        --mod-output-formats bedmethyl wiggle \
        --mod-map-base-conv C T --mod-map-base-conv Z C \
        --devices 0 --processes 30

IOstat

Device             tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
nvme0n1          87.25         0.73         4.88     439415    2923495

The text was updated successfully, but these errors were encountered:

marcus1487 · 2020-08-05T13:26:08Z

I have not experienced a bottleneck in the basecalling queue before, so this is just my best guess, but I would suspect that the mod_basecalls output is likely the bottleneck here. This was not a very efficient format for a large dataset. Mod basecalls are currently output in a table in an HDF5 format file. This file will have one dataset per read (so 580k in your case here). HDF5 might be taking too much time to create new data sets on a file this size while not actually depending directly on the filesystem. If I am correct, then simply removing the mod_basecalls output from the command should alleviate this issue. This output is intended to be swapped out for an unmapped SAM/BAM/CRAM file as specified by the hts-spec group (see here), which should be much more efficient for storage and retrieval.

AAnnan · 2020-08-05T14:03:20Z

Thanks for your reply.

I'll leave the mod_basecalls output out of my next runs, see how it goes and report back.

Looking forward to the new format from hts-spec.

AAnnan · 2020-08-06T17:31:53Z

I confirm that leaving out the mod_basecalls output fixes the problem. Leaving this out, I got no fill up of the output queue and no decrease in performance, even after 800k reads processed.

marcus1487 · 2020-08-20T22:45:53Z

This should be resolved in the 2.2 release. This output has changed to the SAM/BAM/CRAM format. See README for details.

AAnnan closed this as completed Aug 6, 2020

Modernism-01 mentioned this issue Aug 8, 2023

CUDA error at /builds/ofan/ont_core_cpp/ont_core/common/cuda_common.cpp:232: CUDA_ERROR_INVALID_DEVICE #351

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance drop during Read Processing, maxing out "output queue capacity basecalls" but not I/O #46

Performance drop during Read Processing, maxing out "output queue capacity basecalls" but not I/O #46

AAnnan commented Aug 5, 2020

marcus1487 commented Aug 5, 2020

AAnnan commented Aug 5, 2020 •

edited

Loading

AAnnan commented Aug 6, 2020

marcus1487 commented Aug 20, 2020

Performance drop during Read Processing, maxing out "output queue capacity basecalls" but not I/O #46

Performance drop during Read Processing, maxing out "output queue capacity basecalls" but not I/O #46

Comments

AAnnan commented Aug 5, 2020

marcus1487 commented Aug 5, 2020

AAnnan commented Aug 5, 2020 • edited Loading

AAnnan commented Aug 6, 2020

marcus1487 commented Aug 20, 2020

AAnnan commented Aug 5, 2020 •

edited

Loading