-
-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements to sapi5 speech synthesizer #17524
Comments
My friend cc @shenguangrong is interested in contributing this. Would love to hear community feedback |
I noticed that Usually clients should use If you launch multiple SAPI TTS client apps at the same time, you will notice that they cannot speak simultaneously. They are different processes, but when one of them is speaking, others must wait. So SAPI must have implemented some kind of cross-process synchronization, which might increase the delay.
So I think that this method of using the SAPI voices can be tried if you want to decrease latency. Pros:
Cons:
|
@gexgd0419 Thank you for your valuable comment:
I have some concerns about this, it may break compatibility with some TTS engines. cc @LeonarddeR |
Regarding the performance improvements for the SAPI5 speech synthesizer, I've attempted a solution to directly obtain audio data:
|
There's I also found that writing the voice to a wave file through SpFileStream does not need to wait for other SAPI clients to complete speaking, so maybe synchronization doesn't happen when outputting to a file/memory stream. |
Duplicate of #13284 |
Is your feature request related to a problem? Please describe.
The SAPI5 synthesizer in NVDA has noticeable latency between keypress and speech feedback, primarily due to unnecessary silence at the beginning and end of speech segments. This significantly impacts user experience, especially during typing and rapid navigation.
cc @gexgd0419 has previously measured the SAPI5 synthesizer latency, which could provide valuable baseline metrics for this optimization effort. It would be helpful to include their measurement data/methodology to quantify the improvements.
Describe the solution you'd like
Optimize the existing SAPI5 synthesizer by implementing audio stream preprocessing within the current driver. The solution will:
Modify speech output process to:
Add silence detection algorithm that:
Integrate with existing SAPI5 driver:
Describe alternatives you've considered
Existing implementation
Optimization implementation
Additional context
The text was updated successfully, but these errors were encountered: