Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request - [X]Make VADIterator work like get_speech_timestamps function #405

Closed
Simon-chai opened this issue Dec 13, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@Simon-chai
Copy link

🚀 Feature

When we use get_speech_timestamps function we can assgin Parameters like min_speech_duration_ms,as I see it, they two actually are doing the same thing,does it mean that VADIterator can work just the same as get_speech_timestamps function in theory

Motivation

When I doing audio stream detecting by VADIterator (code like the example offer by silero-vad),I found that output end - start <= 0.1s,mostly just noise in environment,and I think it's better to be filtered inside the VADIterator .

Pitch

When I initiate the VADIterator instance I can assign Parameters like get_speech_timestamps function so I can make sure the result are under controll.

Alternatives

Additional context

@Simon-chai Simon-chai added the enhancement New feature or request label Dec 13, 2023
@snakers4
Copy link
Owner

VADIterator can work just the same as get_speech_timestamps function in theory

This is not possible, because get_speech_timestamps "looks into the future" to improve the results.

@snakers4 snakers4 closed this as not planned Won't fix, can't repro, duplicate, stale Mar 26, 2024
@Simon-chai
Copy link
Author

Recently I try to solve it. I find out the fact that

  1. You don't need to "looks into the future" to limit the max speech length ,but you have to accept the consequence of more fragmented speeches. For example,when you limit the max speech length to 5 seconds, it's most likely to cut a 6 seconds speech into a 5s and 1s.
  2. Limit the min speech length is kind of meaningless, no matter we can or can not "looks into the future" ,we just can't truely limit this. Instead we can make the min speech length more controllable by min_silence_duration_ms parameter.

And BTW,I think get_speech_timestamps may fail to trigger the final 'end' if the last few chunk of audio is all speech,hence we now "looks into the future",we can forced trigger an 'end'

I simply implement max speech limitation on VADIterator,inspire by get_speech_timestamps,the make VADIterator work more like get_speech_timestamps to some degree. I willl post the code after fully tested,maybe someone will be interested

@varrerohit
Copy link

Did you manage to implement this code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants