Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPUS reading is ~3x slower compared to ffmpeg in a subprocess #1994

Open
pzelasko opened this issue Nov 8, 2021 · 7 comments
Open

OPUS reading is ~3x slower compared to ffmpeg in a subprocess #1994

pzelasko opened this issue Nov 8, 2021 · 7 comments

Comments

@pzelasko
Copy link

pzelasko commented Nov 8, 2021

🐛 Describe the bug

Technically it's not a bug, but it was the most fitting category. I benchmarked torchaudio vs ffmpeg for reading a long OPUS file (> 1h long, comes from GigaSpeech). It seems that it's much faster to spawn an ffmpeg process and capture its output than to use torchaudio.load(). Please see the below screenshot:

image

You can see the ffmpeg-based reading implementation in Lhotse here (note it's a feature branch, not merged for now): https://github.com/lhotse-speech/lhotse/blob/13500bd742160d556cefbb43e810e1fd5680f906/lhotse/audio.py#L1359-L1411

I wonder whether SoX uses a different OPUS decoder than ffmpeg? I noticed that there is some difference between the audio samples when I read the file from torchaudio and ffmpeg.

(version of code that is copy-pastable)

%load_ext lab_black

import lhotse
from lhotse.audio import Recording, AudioSource, read_opus_ffmpeg, read_opus_torchaudio
from pathlib import Path

lhotse.set_caching_enabled(False)

path = "/export/c27/pzelasko/gigaspeech/audio/podcast/P0000/POD1000000040.opus"

%%time
samples2, sr2 = read_opus_ffmpeg(path=path)

%%time
samples, sr = read_opus_torchaudio(path=path)

import torchaudio

%%time
samples3, sr3 = torchaudio.load(path)

%%time
_ = torchaudio.info(path)

Versions

Collecting environment information...
PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 9.13 (stretch) (x86_64)
GCC version: (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Clang version: 3.8.1-24 (tags/RELEASE_381/final)
CMake version: version 3.21.3
Libc version: glibc-2.10

Python version: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-4.9.0-15-amd64-x86_64-with-debian-9.13
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
GPU 2: GeForce RTX 2080 Ti

Nvidia driver version: 440.33.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] torch==1.9.0
[pip3] torchaudio==0.9.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 h8f6ccaa_8 conda-forge
[conda] k2 1.9.dev20210919 cuda10.2_py3.7_torch1.9.0 k2-fsa
[conda] mkl 2021.3.0 h06a4308_520
[conda] mkl-service 2.4.0 py37h5e8e339_0 conda-forge
[conda] mkl_fft 1.3.1 py37hd3c417c_0
[conda] mkl_random 1.2.2 py37h219a48f_0 conda-forge
[conda] mypy-extensions 0.4.3 pypi_0 pypi
[conda] numpy 1.21.2 py37h20f2e39_0
[conda] numpy-base 1.21.2 py37h79a1101_0
[conda] pytorch 1.9.0 py3.7_cuda10.2_cudnn7.6.5_0 pytorch
[conda] torchaudio 0.9.0 pypi_0 pypi

@mthrok
Copy link
Collaborator

mthrok commented Nov 8, 2021

Hi @pzelasko

torchaudio does not do anything special to handle OPUS. [search]
SoX's OPUS integration seems to have some edges. In the past I saw the encoding of OPUS causes segfault as well.

So my first impression is that this is resulted from sox's implementation.
However x3 is very huge.

I wonder whether SoX uses a different OPUS decoder than ffmpeg? I noticed that there is some difference between the audio samples when I read the file from torchaudio and ffmpeg.

Probably yes. I briefly looked at the ffmpeg code, and OPUS code that SoX adopts, and they do not seem to share the source files. (I recall that xiph.org somewhere on their website claims some of the libraries they provide are reference implementation and not necessarily optimized.)

https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/opusenc.c
https://github.com/xiph/opus/tree/master/src

@mthrok
Copy link
Collaborator

mthrok commented Nov 8, 2021

Following that, I think practically what we can do (not promise at the moment) is to bind ffmpeg to provide native experience.

We get requests for streaming and other formats, which binding ffmpeg is a viable solution.

@pzelasko
Copy link
Author

pzelasko commented Nov 8, 2021

If you can properly bind ffmpeg into Python, that would be pretty amazing, and also as I imagine, a lot of effort.

Anyway, I’m not expecting a “fix” — just wanted to make sure you’re aware (and in case I’m doing sth obviously wrong).

@vadimkantorov
Copy link

It would be nice if torchaudio published some benchmarks of realistic audio decoding perf inside a DataLoader (especially in the view of improvements of https://ffcv.io)...

@vadimkantorov
Copy link

vadimkantorov commented Jul 28, 2022

the slowdown is interesting because both sox and ffmpeg seem to use internally libopus for decoding:

https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/libopusdec.c

https://github.com/dmkrepo/libsox/blob/master/src/opus.c

mthrok pushed a commit to mthrok/audio that referenced this issue Dec 13, 2022
@ozancaglayan
Copy link

This is still an issue by the way, for a file with 5 minutes of speech, torchaudio is almost 4x slower than the other two:

opus_48k_32kbps
 > torchaudio per 5 mins (secs) 2.1418089202139527
 > librosa per 5 mins (secs) 0.639855684619397
 > ffmpeg per 5 mins (secs) 0.582485853228718

@vadimkantorov
Copy link

vadimkantorov commented Mar 21, 2024

This might be related to this bug:

It seems that in modern libopus, the resampler got changed to a much slower one. And I've got some repro/test in that issue.

So if ffmpeg uses a faster built-in resampler and torchvision uses opus-tools resampler, torchvision might be slower

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants