You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I assume you are referring to some of the preprocessing done in WIS around the TTS functionality? That logic is fairly rudimentary for basic number cases (temperature for instance, or listing the value from a numeric entity), but wasn't really fleshed out for more complex cases. The TTS in general isn't the most complex (we are currently using SpeechT5), and we are aware it has trouble handling numbers like in your example among others. I've tried other TTS and noticed that they all have their quirks (for instance I tried your above cases on Coqui, and while it handled 30-Minutes just fine, it pronounced GPT4 as "upt" and didn't pronounce the number at all.
In my own case I have been a bit more explicit with how I generate the text being fed into TTS (for instance I have converters for time that change the text into more natural spoken text, such as changing 02:05PM into "two oh five in the afternoon"). I believe in the future we may look into using some NLU/preprocessing to help reduce these issues. :)
As I see in the wis log, there is a number conversion, which, however, works only when the number is not part of the text or not delimited by dash.
Examples that do not work: 30-minutes, GPT4.
The text was updated successfully, but these errors were encountered: