-
Notifications
You must be signed in to change notification settings - Fork 672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPUS reading is ~3x slower compared to ffmpeg in a subprocess #1994
Comments
Hi @pzelasko torchaudio does not do anything special to handle OPUS. [search] So my first impression is that this is resulted from sox's implementation.
Probably yes. I briefly looked at the ffmpeg code, and OPUS code that SoX adopts, and they do not seem to share the source files. (I recall that xiph.org somewhere on their website claims some of the libraries they provide are reference implementation and not necessarily optimized.) https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/opusenc.c |
Following that, I think practically what we can do (not promise at the moment) is to bind ffmpeg to provide native experience. We get requests for streaming and other formats, which binding ffmpeg is a viable solution. |
If you can properly bind ffmpeg into Python, that would be pretty amazing, and also as I imagine, a lot of effort. Anyway, I’m not expecting a “fix” — just wanted to make sure you’re aware (and in case I’m doing sth obviously wrong). |
It would be nice if torchaudio published some benchmarks of realistic audio decoding perf inside a DataLoader (especially in the view of improvements of https://ffcv.io)... |
the slowdown is interesting because both sox and ffmpeg seem to use internally libopus for decoding: https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/libopusdec.c |
This is still an issue by the way, for a file with 5 minutes of speech, torchaudio is almost 4x slower than the other two:
|
This might be related to this bug: It seems that in modern libopus, the resampler got changed to a much slower one. And I've got some repro/test in that issue. So if ffmpeg uses a faster built-in resampler and torchvision uses opus-tools resampler, torchvision might be slower |
🐛 Describe the bug
Technically it's not a bug, but it was the most fitting category. I benchmarked torchaudio vs ffmpeg for reading a long OPUS file (> 1h long, comes from GigaSpeech). It seems that it's much faster to spawn an ffmpeg process and capture its output than to use
torchaudio.load()
. Please see the below screenshot:You can see the ffmpeg-based reading implementation in Lhotse here (note it's a feature branch, not merged for now): https://github.com/lhotse-speech/lhotse/blob/13500bd742160d556cefbb43e810e1fd5680f906/lhotse/audio.py#L1359-L1411
I wonder whether SoX uses a different OPUS decoder than ffmpeg? I noticed that there is some difference between the audio samples when I read the file from torchaudio and ffmpeg.
(version of code that is copy-pastable)
Versions
Collecting environment information...
PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 9.13 (stretch) (x86_64)
GCC version: (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Clang version: 3.8.1-24 (tags/RELEASE_381/final)
CMake version: version 3.21.3
Libc version: glibc-2.10
Python version: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-4.9.0-15-amd64-x86_64-with-debian-9.13
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
GPU 2: GeForce RTX 2080 Ti
Nvidia driver version: 440.33.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] torch==1.9.0
[pip3] torchaudio==0.9.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 h8f6ccaa_8 conda-forge
[conda] k2 1.9.dev20210919 cuda10.2_py3.7_torch1.9.0 k2-fsa
[conda] mkl 2021.3.0 h06a4308_520
[conda] mkl-service 2.4.0 py37h5e8e339_0 conda-forge
[conda] mkl_fft 1.3.1 py37hd3c417c_0
[conda] mkl_random 1.2.2 py37h219a48f_0 conda-forge
[conda] mypy-extensions 0.4.3 pypi_0 pypi
[conda] numpy 1.21.2 py37h20f2e39_0
[conda] numpy-base 1.21.2 py37h79a1101_0
[conda] pytorch 1.9.0 py3.7_cuda10.2_cudnn7.6.5_0 pytorch
[conda] torchaudio 0.9.0 pypi_0 pypi
The text was updated successfully, but these errors were encountered: