OPUS reading is ~3x slower compared to ffmpeg in a subprocess #1994

pzelasko · 2021-11-08T01:40:08Z

🐛 Describe the bug

Technically it's not a bug, but it was the most fitting category. I benchmarked torchaudio vs ffmpeg for reading a long OPUS file (> 1h long, comes from GigaSpeech). It seems that it's much faster to spawn an ffmpeg process and capture its output than to use torchaudio.load(). Please see the below screenshot:

You can see the ffmpeg-based reading implementation in Lhotse here (note it's a feature branch, not merged for now): https://github.com/lhotse-speech/lhotse/blob/13500bd742160d556cefbb43e810e1fd5680f906/lhotse/audio.py#L1359-L1411

I wonder whether SoX uses a different OPUS decoder than ffmpeg? I noticed that there is some difference between the audio samples when I read the file from torchaudio and ffmpeg.

(version of code that is copy-pastable)

%load_ext lab_black

import lhotse
from lhotse.audio import Recording, AudioSource, read_opus_ffmpeg, read_opus_torchaudio
from pathlib import Path

lhotse.set_caching_enabled(False)

path = "/export/c27/pzelasko/gigaspeech/audio/podcast/P0000/POD1000000040.opus"

%%time
samples2, sr2 = read_opus_ffmpeg(path=path)

%%time
samples, sr = read_opus_torchaudio(path=path)

import torchaudio

%%time
samples3, sr3 = torchaudio.load(path)

%%time
_ = torchaudio.info(path)

Versions

Collecting environment information...
PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 9.13 (stretch) (x86_64)
GCC version: (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Clang version: 3.8.1-24 (tags/RELEASE_381/final)
CMake version: version 3.21.3
Libc version: glibc-2.10

Python version: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-4.9.0-15-amd64-x86_64-with-debian-9.13
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
GPU 2: GeForce RTX 2080 Ti

Nvidia driver version: 440.33.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] torch==1.9.0
[pip3] torchaudio==0.9.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 h8f6ccaa_8 conda-forge
[conda] k2 1.9.dev20210919 cuda10.2_py3.7_torch1.9.0 k2-fsa
[conda] mkl 2021.3.0 h06a4308_520
[conda] mkl-service 2.4.0 py37h5e8e339_0 conda-forge
[conda] mkl_fft 1.3.1 py37hd3c417c_0
[conda] mkl_random 1.2.2 py37h219a48f_0 conda-forge
[conda] mypy-extensions 0.4.3 pypi_0 pypi
[conda] numpy 1.21.2 py37h20f2e39_0
[conda] numpy-base 1.21.2 py37h79a1101_0
[conda] pytorch 1.9.0 py3.7_cuda10.2_cudnn7.6.5_0 pytorch
[conda] torchaudio 0.9.0 pypi_0 pypi

The text was updated successfully, but these errors were encountered:

mthrok · 2021-11-08T17:55:16Z

Hi @pzelasko

torchaudio does not do anything special to handle OPUS. [search]
SoX's OPUS integration seems to have some edges. In the past I saw the encoding of OPUS causes segfault as well.

So my first impression is that this is resulted from sox's implementation.
However x3 is very huge.

I wonder whether SoX uses a different OPUS decoder than ffmpeg? I noticed that there is some difference between the audio samples when I read the file from torchaudio and ffmpeg.

Probably yes. I briefly looked at the ffmpeg code, and OPUS code that SoX adopts, and they do not seem to share the source files. (I recall that xiph.org somewhere on their website claims some of the libraries they provide are reference implementation and not necessarily optimized.)

https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/opusenc.c
https://github.com/xiph/opus/tree/master/src

mthrok · 2021-11-08T17:56:56Z

Following that, I think practically what we can do (not promise at the moment) is to bind ffmpeg to provide native experience.

We get requests for streaming and other formats, which binding ffmpeg is a viable solution.

pzelasko · 2021-11-08T18:34:31Z

If you can properly bind ffmpeg into Python, that would be pretty amazing, and also as I imagine, a lot of effort.

Anyway, I’m not expecting a “fix” — just wanted to make sure you’re aware (and in case I’m doing sth obviously wrong).

vadimkantorov · 2022-01-20T11:37:00Z

It would be nice if torchaudio published some benchmarks of realistic audio decoding perf inside a DataLoader (especially in the view of improvements of https://ffcv.io)...

vadimkantorov · 2022-07-28T13:07:33Z

the slowdown is interesting because both sox and ffmpeg seem to use internally libopus for decoding:

https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/libopusdec.c

https://github.com/dmkrepo/libsox/blob/master/src/opus.c

ozancaglayan · 2023-01-31T13:12:34Z

This is still an issue by the way, for a file with 5 minutes of speech, torchaudio is almost 4x slower than the other two:

opus_48k_32kbps
 > torchaudio per 5 mins (secs) 2.1418089202139527
 > librosa per 5 mins (secs) 0.639855684619397
 > ffmpeg per 5 mins (secs) 0.582485853228718

vadimkantorov · 2024-03-21T15:03:53Z

This might be related to this bug:

[bug] opusenc resampler is 3-4x slower in >= opus-tools-0.2 compared to opus-tools-0.1.10 (big degradation in encoding 8khz audio) xiph/opus-tools#87

It seems that in modern libopus, the resampler got changed to a much slower one. And I've got some repro/test in that issue.

So if ffmpeg uses a faster built-in resampler and torchvision uses opus-tools resampler, torchvision might be slower

pzelasko mentioned this issue Nov 8, 2021

error when dumping gigaspeech XL feature lhotse-speech/lhotse#452

Closed

mthrok mentioned this issue Jan 20, 2022

[dicscussion] Batched CPU/GPU audio decoding / encoding #2159

Open

mthrok pushed a commit to mthrok/audio that referenced this issue Dec 13, 2022

Fix torchvision tutorial (pytorch#1994)

47a91e0

vadimkantorov mentioned this issue Apr 26, 2023

Warning about slow loading of non-wav formats NVIDIA/NeMo#3336

Closed

pzelasko mentioned this issue Oct 10, 2023

Fix some problems about Reading OPUS files lhotse-speech/lhotse#1176

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPUS reading is ~3x slower compared to ffmpeg in a subprocess #1994

OPUS reading is ~3x slower compared to ffmpeg in a subprocess #1994

pzelasko commented Nov 8, 2021 •

edited

Loading

mthrok commented Nov 8, 2021

mthrok commented Nov 8, 2021

pzelasko commented Nov 8, 2021 •

edited

Loading

vadimkantorov commented Jan 20, 2022

vadimkantorov commented Jul 28, 2022 •

edited

Loading

ozancaglayan commented Jan 31, 2023

vadimkantorov commented Mar 21, 2024 •

edited

Loading

OPUS reading is ~3x slower compared to ffmpeg in a subprocess #1994

OPUS reading is ~3x slower compared to ffmpeg in a subprocess #1994

Comments

pzelasko commented Nov 8, 2021 • edited Loading

🐛 Describe the bug

Versions

mthrok commented Nov 8, 2021

mthrok commented Nov 8, 2021

pzelasko commented Nov 8, 2021 • edited Loading

vadimkantorov commented Jan 20, 2022

vadimkantorov commented Jul 28, 2022 • edited Loading

ozancaglayan commented Jan 31, 2023

vadimkantorov commented Mar 21, 2024 • edited Loading

pzelasko commented Nov 8, 2021 •

edited

Loading

pzelasko commented Nov 8, 2021 •

edited

Loading

vadimkantorov commented Jul 28, 2022 •

edited

Loading

vadimkantorov commented Mar 21, 2024 •

edited

Loading