-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit bf825e4
Showing
18 changed files
with
2,147 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
.* | ||
!/.gitignore | ||
*.wav | ||
|
||
dat/*/ | ||
res/*/ | ||
|
||
*.pyc | ||
\__pycache__/ | ||
|
||
src/tmp*.* | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Here we will save the preprocessed files... | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
name: blow | ||
channels: | ||
- pytorch | ||
- defaults | ||
dependencies: | ||
- backcall=0.1.0=py37_0 | ||
- blas=1.0=mkl | ||
- ca-certificates=2019.1.23=0 | ||
- certifi=2019.3.9=py37_0 | ||
- cffi=1.11.5=py37he75722e_1 | ||
- cudatoolkit=10.0.130=0 | ||
- cycler=0.10.0=py37_0 | ||
- dbus=1.13.6=h746ee38_0 | ||
- decorator=4.3.2=py37_0 | ||
- expat=2.2.6=he6710b0_0 | ||
- fontconfig=2.13.0=h9420a91_0 | ||
- freetype=2.9.1=h8a8886c_1 | ||
- glib=2.56.2=hd408876_0 | ||
- gst-plugins-base=1.14.0=hbbd80ab_1 | ||
- gstreamer=1.14.0=hb453b48_1 | ||
- icu=58.2=h9c2bf20_1 | ||
- intel-openmp=2019.1=144 | ||
- ipython=7.2.0=py37h39e3cac_0 | ||
- ipython_genutils=0.2.0=py37_0 | ||
- jedi=0.13.2=py37_0 | ||
- jpeg=9b=h024ee3a_2 | ||
- kiwisolver=1.0.1=py37hf484d3e_0 | ||
- libedit=3.1.20181209=hc058e9b_0 | ||
- libffi=3.2.1=hd88cf55_4 | ||
- libgcc-ng=8.2.0=hdf63c60_1 | ||
- libgfortran-ng=7.3.0=hdf63c60_0 | ||
- libpng=1.6.36=hbc83047_0 | ||
- libstdcxx-ng=8.2.0=hdf63c60_1 | ||
- libtiff=4.0.10=h2733197_2 | ||
- libuuid=1.0.3=h1bed415_2 | ||
- libxcb=1.13=h1bed415_1 | ||
- libxml2=2.9.9=he19cac6_0 | ||
- matplotlib=3.0.2=py37h5429711_0 | ||
- mkl=2019.1=144 | ||
- mkl_fft=1.0.10=py37ha843d7b_0 | ||
- mkl_random=1.0.2=py37hd81dba3_0 | ||
- ncurses=6.1=he6710b0_1 | ||
- ninja=1.8.2=py37h6bb024c_1 | ||
- numpy=1.15.4=py37h7e9f1db_0 | ||
- numpy-base=1.15.4=py37hde5b4d6_0 | ||
- olefile=0.46=py37_0 | ||
- openssl=1.1.1b=h7b6447c_1 | ||
- pandas=0.24.1=py37he6710b0_0 | ||
- parso=0.3.2=py37_0 | ||
- patsy=0.5.1=py37_0 | ||
- pcre=8.42=h439df22_0 | ||
- pexpect=4.6.0=py37_0 | ||
- pickleshare=0.7.5=py37_0 | ||
- pillow=5.4.1=py37h34e0f95_0 | ||
- pip=19.0.1=py37_0 | ||
- prompt_toolkit=2.0.8=py_0 | ||
- ptyprocess=0.6.0=py37_0 | ||
- pycparser=2.19=py37_0 | ||
- pygments=2.3.1=py37_0 | ||
- pyparsing=2.3.1=py37_0 | ||
- pyqt=5.9.2=py37h05f1152_2 | ||
- python=3.7.2=h0371630_0 | ||
- python-dateutil=2.7.5=py37_0 | ||
- pytorch=1.0.1=py3.7_cuda10.0.130_cudnn7.4.2_2 | ||
- pytz=2018.9=py37_0 | ||
- qt=5.9.7=h5867ecd_1 | ||
- readline=7.0=h7b6447c_5 | ||
- scikit-learn=0.20.2=py37hd81dba3_0 | ||
- scipy=1.2.0=py37h7c811a0_0 | ||
- seaborn=0.9.0=py37_0 | ||
- setuptools=40.7.3=py37_0 | ||
- sip=4.19.8=py37hf484d3e_0 | ||
- six=1.12.0=py37_0 | ||
- sqlite=3.26.0=h7b6447c_0 | ||
- statsmodels=0.9.0=py37h035aef0_0 | ||
- tk=8.6.8=hbc83047_0 | ||
- torchvision=0.2.2=py_3 | ||
- tornado=5.1.1=py37h7b6447c_0 | ||
- traitlets=4.3.2=py37_0 | ||
- wcwidth=0.1.7=py37_0 | ||
- wheel=0.32.3=py37_0 | ||
- xz=5.2.4=h14c3975_4 | ||
- zlib=1.2.11=h7b6447c_3 | ||
- zstd=1.3.7=h0b5b093_0 | ||
- pip: | ||
- adabound==0.0.5 | ||
- audioread==2.1.6 | ||
- joblib==0.13.2 | ||
- librosa==0.6.3 | ||
- llvmlite==0.27.1 | ||
- numba==0.42.1 | ||
- prettytable==0.7.2 | ||
- resampy==0.2.1 | ||
prefix: /home/jsj/miniconda3/envs/blow | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
# Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion | ||
|
||
#### Abstract | ||
|
||
End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations. Voice conversion, in which a model has to impersonate a speaker in a recording, is one of those situations. In this paper, we propose Blow, a single-scale normalizing flow using hypernetwork conditioning to perform many-to-many voice conversion between raw audio. Blow is trained end-to-end, with non-parallel data, on a frame-by-frame basis using a single speaker identifier. We show that Blow compares favorably to existing flow-based architectures and other competitive baselines, obtaining equal or better performance in both objective and subjective evaluations. We further assess the impact of its main components with an ablation study, and quantify a number of properties such as the necessary amount of training data or the preference for source or target speakers. | ||
|
||
#### Reference | ||
|
||
J. Serrà, S. Pascual, & C. Segura (2019). **Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion**. ArXiv ... | ||
|
||
``` | ||
``` | ||
|
||
#### Links | ||
|
||
Paper: http://xxx (latest version) | ||
|
||
Audio examples: https://blowconversions.github.io | ||
|
||
## Installation | ||
|
||
Suggested steps are: | ||
|
||
1. Clone repository. | ||
1. Create a conda environment (you can use the `environment.yml` file). | ||
1. The following folder structure will be produced by the repo. From the git folder: | ||
- `src/`: Where all scripts lie. | ||
- `dat/`: Place to put all preprocessed files (in subfolders). | ||
- `res/`: Place to save results. | ||
|
||
## Running the code | ||
|
||
All the following instructions assume you run them from the `src` folder. | ||
Also, check the arguments/code for the scripts below. You may want to run with a different configuration. | ||
|
||
### Preprocessing | ||
|
||
To preprocess the audio files: | ||
``` | ||
python preprocess.py --path_in=/path/to/wav/root/folder/ --extension=.wav --path_out=../dat/pt/vctk | ||
``` | ||
Our code expects audio filenames to be in the form `<speaker/class_id>_<utterance/track_id>_whatever.extension`, | ||
where elements inside `<>` do not contain the character `_` and IDs need not to be consecutive (example: `s001_u045_xxx.wav`). | ||
Therefore, if your data is not in this format, you should run or adapt the script `misc/rename_dataset.py`. | ||
|
||
### Training | ||
|
||
To train Blow: | ||
``` | ||
python train.py --path_data=../dat/pt/vctk/ --path_out=../res/vctk/blow/ --model=blow | ||
``` | ||
|
||
### Synthesis | ||
|
||
To transform/synthesize audio with a given learnt model: | ||
``` | ||
python synthesize.py --path_model=../res/vctk/blow/ --path_out=../res/vctk/blow/audio/ --convert | ||
``` | ||
|
||
### Other | ||
|
||
To execute the classification script: | ||
``` | ||
python classify.py --mode=train --path_in=../dat/wav/vctk/train/ --fn_cla=../res/vctk/classif/trained_model.pt --fn_res=../res/vctk/classif/res_train.pt | ||
python classify.py --mode=test --path_in=../res/vctk/blow/audio/ --fn_cla=../res/vctk/classif/trained_model.pt --fn_res=../res/vctk/classif/res_test.pt | ||
``` | ||
|
||
To listen to some conversions (using sox's `play` command): | ||
``` | ||
python misc/listening_test.py --path_refs_train=../dat/wav/vctk/train/ --path_refs_test=../dat/wav/vctk/test/ --paths_convs=../res/blow/audio/,../res/test1/audio/ --player=play | ||
``` | ||
|
||
## Notes | ||
|
||
- If using this code, parts of it, or developments from it, please cite the above reference. | ||
- We do not provide any support or assistance for the supplied code nor we offer any other compilation/variant of it. | ||
- We assume no responsibility regarding the provided code. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Here we will save logs, trained models, and audios... | ||
|
Oops, something went wrong.