About Speaker Voice #17

shoegazerstella · 2020-07-29T10:20:47Z

I was playing with the preprocessing parameters and I was able to change a bit the sound of the synthesized voice.
I was wondering if there was a clever way to to do it in terms of pitch, energy, style, timbre etc..
Thanks!

bshall · 2020-07-29T14:55:04Z

Hi @shoegazerstella,

It's fun to mess with the inputs but I think changing the speech characteristics in any systematic way is pretty difficult. I remember the issue in #3 was that changing num_fft resulted in a pitch shift. I think a more principled method would be vocal tract length perturbation (see "Vocal tract length perturbation (VTLP) improves speech recognition" for details). It's relatively easy to mess with the mel filters in librosa so that'd be a simple place to start.

Otherwise, if you're interested in changing the speaker entirely I've done some work on voice conversion here. There are also a bunch of papers/repos that convert the spectrogram directly and then synthesize with a vocoder (happy to suggest some if you're interested).

shoegazerstella · 2020-07-29T15:14:53Z

if you're interested in changing the speaker entirely I've done some work on voice conversion here. There are also a bunch of papers/repos that convert the spectrogram directly and then synthesize with a vocoder (happy to suggest some if you're interested).

Exacly, my aim is to change the speaker entirely.

I was reading more on voice cloning and I did find these two works:

But if I understand well, your approach on voice conversion is a little bit different. I'll look more into it!
Would be awesome if you could suggest other approaches too!
Thanks a lot!

bshall · 2020-07-29T15:54:57Z

No problem!

Well, there are two options:

Voice cloning (as you mentioned) - where you synthesize speech from a specific voice from text.
Voice conversion - where you take audio from one speaker and directly convert it to a target speaker.

I think Real-Time-Voice-Cloning the best available open-source project for voice cloning. For voice conversion, there is https://github.com/liusongxiang/StarGAN-Voice-Conversion and https://github.com/auspicious3000/autovc for example.

Hope that helps!

shoegazerstella · 2020-07-29T16:08:56Z

So yes, the approaches are two indeed.
For the TTS part I was using an implementation of FastSpeech2 and to be honest I didn't want to change that because it's super fast in CPU.
So I might try both approaches and decide on both quality of results and speed.
Again thanks a lot! :)

shoegazerstella closed this as completed Jul 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Speaker Voice #17

About Speaker Voice #17

shoegazerstella commented Jul 29, 2020 •

edited

Loading

bshall commented Jul 29, 2020

shoegazerstella commented Jul 29, 2020 •

edited

Loading

bshall commented Jul 29, 2020

shoegazerstella commented Jul 29, 2020

About Speaker Voice #17

About Speaker Voice #17

Comments

shoegazerstella commented Jul 29, 2020 • edited Loading

bshall commented Jul 29, 2020

shoegazerstella commented Jul 29, 2020 • edited Loading

bshall commented Jul 29, 2020

shoegazerstella commented Jul 29, 2020

shoegazerstella commented Jul 29, 2020 •

edited

Loading

shoegazerstella commented Jul 29, 2020 •

edited

Loading