C++17 library that implements inference for the Demucs v4 hybrid transformer and Demucs v3 hybrid models, which are high-performance PyTorch neural networks for music source separation.
It uses only the standard library (C++17) and the header-only library Eigen as dependencies, making it suitable to compile and run on many platforms. It was designed for low-memory environments by sacrificing the speed of the Torch implementation.
See my other project umx.cpp for a similar library for Open-Unmix.
The inference library (in src/
) uses the ggml file format to serialize the PyTorch weights of hdemucs_mmi
, htdemucs
, htdemucs_6s
, and htdemucs_ft
(v3, v4 4-source, v4 6-source, v4 fine-tuned) to a binary file format, and Eigen to implement the inference (with OpenMP as a requirement).
The cli programs (in cli-apps/
) additionally use libnyquist to read and write audio files, and the multithreaded cli programs use C++11's std::thread
.
All Hybrid-Transformer weights (4-source, 6-source, fine-tuned) are supported. See the Convert weights section below. Inference for the Demucs v3 Hybrid model weights hdemucs_mmi
is also supported. Demixing quality is practically identical to PyTorch as shown in the SDR scores doc.
src
contains the library for Demucs inference, and cli-apps
contains four driver programs, which compile to:
demucs.cpp.main
: run a single model (4s, 6s, or a single fine-tuned model)demucs_ft.cpp.main
: run all four fine-tuned models forhtdemucs_ft
inference, same as the BagOfModels idea of PyTorch Demucsdemucs_mt.cpp.main
: run a single model, multi-threadeddemucs_ft_mt.cpp.main
: run all four fine-tuned models, multi-threadeddemucs_v3.cpp.main
: run a single model for v3hdemucs_mmi
demucs_v3_mt.cpp.main
: run a single model for v3hdemucs_mmi
, multi-threaded
See the PERFORMANCE doc for time measurements, benchmarks, details on multi-threading, external BLAS libraries, etc.
Clone the repo
Make sure you clone with submodules to get all vendored libraries (e.g. Eigen):
$ git clone --recurse-submodules https://github.com/sevagh/demucs.cpp
Install C++ dependencies, e.g. CMake, gcc, C++/g++, OpenBLAS for your OS (my instructions are for Pop!_OS 22.04):
$ sudo apt-get install gcc g++ cmake clang-tools libopenblas0-openmp libopenblas-openmp-dev
Compile with CMake:
$ mkdir -p build && cd build && cmake .. && make -j16
Pre-converted and ready to use ggml-weights can be downloaded at huggingface. These were created as described below.
Set up a Python env
The first step is to create a Python environment (however you like; I'm a fan of mamba) and install the requirements.txt
file:
$ mamba create --name demucscpp python=3.11
$ mamba activate demucscpp
$ python -m pip install -r ./scripts/requirements.txt
Dump Demucs weights to ggml file, with flag --six-source
for the 6-source variant, all of --ft-drums, --ft-vocals, --ft-bass, --ft-other
for the fine-tuned models, and --v3
for the v3 model:
$ python ./scripts/convert-pth-to-ggml.py ./ggml-demucs
...
Processing variable: crosstransformer.layers_t.4.norm2.bias with shape: (512,) , dtype: float16
Processing variable: crosstransformer.layers_t.4.norm_out.weight with shape: (512,) , dtype: float16
Processing variable: crosstransformer.layers_t.4.norm_out.bias with shape: (512,) , dtype: float16
Processing variable: crosstransformer.layers_t.4.gamma_1.scale with shape: (512,) , dtype: float16
Processing variable: crosstransformer.layers_t.4.gamma_2.scale with shape: (512,) , dtype: float16
Done. Output file: ggml-demucs/ggml-model-htdemucs-4s-f16.bin
All supported models would look like this:
$ ls ./ggml-demucs/
total 613M
160M May 5 14:38 ggml-model-hdemucs_mmi-v3-f16.bin
53M May 5 16:50 ggml-model-htdemucs-6s-f16.bin
81M May 5 16:50 ggml-model-htdemucs_ft_vocals-4s-f16.bin
81M May 5 16:50 ggml-model-htdemucs_ft_bass-4s-f16.bin
81M May 5 16:50 ggml-model-htdemucs_ft_drums-4s-f16.bin
81M May 5 16:50 ggml-model-htdemucs_ft_other-4s-f16.bin
81M May 5 16:51 ggml-model-htdemucs-4s-f16.bin
Run C++ inference on your track with the built binaries:
# build is the cmake build dir from above
$ ./build/demucs.cpp.main ../ggml-demucs/ggml-model-htdemucs-4s-f16.bin /path/to/my/track.wav ./demucs-out-cpp/
...
Loading tensor crosstransformer.layers_t.4.gamma_2.scale with shape [512, 1, 1, 1]
crosstransformer.layers_t.4.gamma_2.scale: [ 512], type = float, 0.00 MB
Loaded model (533 tensors, 80.08 MB) in 0.167395 s
demucs_model_load returned true
Starting demucs inference
...
Freq: decoder 3
Time: decoder 3
Mask + istft
mix: 2, 343980
mix: 2, 343980
mix: 2, 343980
mix: 2, 343980
returned!
Writing wav file "./demucs-out-cpp/target_0_drums.wav"
Encoder Status: 0
Writing wav file "./demucs-out-cpp/target_1_bass.wav"
Encoder Status: 0
Writing wav file "./demucs-out-cpp/target_2_other.wav"
Encoder Status: 0
Writing wav file "./demucs-out-cpp/target_3_vocals.wav"
Encoder Status: 0
For the 6-source model, additional targets 4 and 5 correspond to guitar and piano.