Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Speaker Voice #17

Closed
shoegazerstella opened this issue Jul 29, 2020 · 4 comments
Closed

About Speaker Voice #17

shoegazerstella opened this issue Jul 29, 2020 · 4 comments

Comments

@shoegazerstella
Copy link

shoegazerstella commented Jul 29, 2020

I was playing with the preprocessing parameters and I was able to change a bit the sound of the synthesized voice.
I was wondering if there was a clever way to to do it in terms of pitch, energy, style, timbre etc..
Thanks!

@bshall
Copy link
Owner

bshall commented Jul 29, 2020

Hi @shoegazerstella,

It's fun to mess with the inputs but I think changing the speech characteristics in any systematic way is pretty difficult. I remember the issue in #3 was that changing num_fft resulted in a pitch shift. I think a more principled method would be vocal tract length perturbation (see "Vocal tract length perturbation (VTLP) improves speech recognition" for details). It's relatively easy to mess with the mel filters in librosa so that'd be a simple place to start.

Otherwise, if you're interested in changing the speaker entirely I've done some work on voice conversion here. There are also a bunch of papers/repos that convert the spectrogram directly and then synthesize with a vocoder (happy to suggest some if you're interested).

@shoegazerstella
Copy link
Author

shoegazerstella commented Jul 29, 2020

if you're interested in changing the speaker entirely I've done some work on voice conversion here. There are also a bunch of papers/repos that convert the spectrogram directly and then synthesize with a vocoder (happy to suggest some if you're interested).

Exacly, my aim is to change the speaker entirely.

I was reading more on voice cloning and I did find these two works:

But if I understand well, your approach on voice conversion is a little bit different. I'll look more into it!
Would be awesome if you could suggest other approaches too!
Thanks a lot!

@bshall
Copy link
Owner

bshall commented Jul 29, 2020

No problem!

Well, there are two options:

  1. Voice cloning (as you mentioned) - where you synthesize speech from a specific voice from text.
  2. Voice conversion - where you take audio from one speaker and directly convert it to a target speaker.

I think Real-Time-Voice-Cloning the best available open-source project for voice cloning. For voice conversion, there is https://github.com/liusongxiang/StarGAN-Voice-Conversion and https://github.com/auspicious3000/autovc for example.

Hope that helps!

@shoegazerstella
Copy link
Author

So yes, the approaches are two indeed.
For the TTS part I was using an implementation of FastSpeech2 and to be honest I didn't want to change that because it's super fast in CPU.
So I might try both approaches and decide on both quality of results and speed.
Again thanks a lot! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants