DeepMOS

Deep learning approach for automatic MOS prediction.

Our solution is a modified version of the following GitHub repository: https://github.com/nii-yamagishilab/mos-finetune-ssl

The instructions of dataset access can be found in the original repository.

Environment

You can run our solution in a Docker container. Please set the host and container directories in the run_docker.sh file, and run it. In the container, please run install_in_docker.sh before running the scripts.

Training (transfer learning from pretrained Wav2Vec2 models)

The training can be performed for the MAIN track of VoiceMOS Challange 2022 in the following way:

python deepmos.py --datadir <FOLDER FOR DATA> --fairseq_base_model <PRETRAINED WAV2VEC2 MODEL, E.G. libri960_big.pt> --nunits <LSTM SIZE> --nlstm <LSTM LAYERS> --dropout <DROPOUT RATE> --batch_size <BATCH SIZE>

There are additional settings in deepmos.py, which you can see by running python deepmos.py -h.

For the OOD track you can run the same training scripcs. In case of OOD we recommend to train a model with the MAIN track model first, and then train that model further with the OOD data.

If you run out of GPU memory, you may need to decrease the batch_size. E.g. batch_size=4 works with NVIDIA V100 32GB GPUs.

Predictions

The training scripts results in JSON files in the checkpoints folder. In order to run the predictions, the following command should be run:

python predict.py --datadir <FOLDER FOR DATA> --json <JSON FILE OF THE TRAINED MODEL> --outfile <FILE FOR THE PREDICTIONS>

Acknowledgements

If you are using our solution, thank you for citeing the corresponding paper (submitted to Interspeech 2022):

B. Gyires-Tóth, Cs. Zainkó (2022). Improving Self-Supervised Learning-based MOS Prediction Networks. arXiv preprint arXiv:2204.11030.

URL: https://arxiv.org/abs/2204.11030

License

BSD 3-Clause License

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data_exploration		data_exploration
LICENSE		LICENSE
README.md		README.md
deepmos.py		deepmos.py
install_in_docker.sh		install_in_docker.sh
predict.py		predict.py
run_docker.sh		run_docker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepMOS

Environment

Training (transfer learning from pretrained Wav2Vec2 models)

Predictions

Acknowledgements

License

About

Releases

Packages

Languages

License

BME-SmartLab/DeepMOS

Folders and files

Latest commit

History

Repository files navigation

DeepMOS

Environment

Training (transfer learning from pretrained Wav2Vec2 models)

Predictions

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages