-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how can I know when the audio start #24
Comments
Hi @yijinsheng - you can do this by subclassing Add a When the algorithm determines the user started talking, set the In Make sure that in |
Please let me know how it goes and if you share your demo, I can add it to the cookbook: https://freddyaboulton.github.io/gradio-webrtc/cookbook/ |
Where you able to figure this out @yijinsheng ? |
I wrote the following class based on your inputs and it seems to be handling interruption detection and replying to the interruption on voice activity well. (Adapted from ReplyOnPause function before the async support changes.) Click to expand
The only issue is, Although the LLM function (ReplyFnGenerator) is being called again instantly on interruption, the audio stream from TTS is taking 3-4 seconds before stopping - It seems to be emitting the queued audio chunks yielded by the ReplyFnGenerator function before the interruption leading to a delay in stopping the audio stream. Ideally we want the audio stream to stop the moment the user starts talking. Any advice or input on how to deal with this? My intuition is that the stream handler functionality needs to be modified for this, I'm not too sure. Please let me know. Thanks! |
@freddyaboulton, Please let me know, any high-level advice would work too |
Hi @duhtapioca ! Thanks for your patience, I was on holiday break. Yes I think you would need to clear the actual output audio queue when the No way to do that now. I think one thing we can do is store a reference to the output queue in Can you see if that fixes the issue? Happy to merge a PR in if so. |
Exactly. Following @freddyaboulton's latest response, you can easily empty the audio_callback queue. For example, I am developing a real-time OpenAI application using the WebRTC library. In my case, to stop the audio after an interruption, I simply need to:
Hope it helps! |
Hi again! While the previous implementation for managing interruptions generally works, I've encountered an issue where, after some interruptions, the assistant remains silent for several seconds before responding with the new answer, despite the audio queue being cleared and new audio chunks being processed for the next response. Additionally, there are instances where the assistant does not respond at all after an interruption, failing to play any further audio. What do you think could be causing this? @freddyaboulton |
@albertofh98 are we able to fix it? |
Hello! |
I want to develop a LLM based voice to voice app. so I need to know when users start to talk so I can interrupt the LLM and tts output. but I can only see the ReplyOnPause function,what I need is a function which tells me when the user start to talk
The text was updated successfully, but these errors were encountered: