Improve the responsiveness of voices by trimming the leading silence #17614
Labels
component/speech
p4
https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority
performance
triaged
Has been triaged, issue is waiting for implementation.
Is your feature request related to a problem? Please describe.
This is related to #13284.
#17592 closed that issue by making SAPI5 voices output via WASAPI. This did improve the responsiveness, but we can improve it even further by removing the leading silence part.
Take
Microsoft Zira Desktop
(SAPI5) as an example. When speaking at 1X speed, the leading silence is 100ms long. When speaking at its maximum rate (3X speed), the leading silence becomes about 30ms long. If we can remove the leading silence, it will respond even faster.Other voices such as OneCore voices also have a few milliseconds leading silence.
Describe the solution you'd like
We can detect and remove the silence audio part in
WavePlayer
, either in the Python part or in the C++ part. As eSpeak, OneCore and SAPI5 (plus MSSP) all useWavePlayer
now, they can all benefit from this. The synthesizer may need to tellWavePlayer
when the audio will start or end, so thatWavePlayer
can locate the "leading silence" part more easily.Describe alternatives you've considered
Create a stand-alone module for detecting and removing the silence audio part, either in Python or in C++. The synthesizers should pass the audio data to this module before feeding it to
WavePlayer
.Additional context
I'm not sure what is the best approach to implement this.
The text was updated successfully, but these errors were encountered: