You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we use get_speech_timestamps function we can assgin Parameters like min_speech_duration_ms,as I see it, they two actually are doing the same thing,does it mean that VADIterator can work just the same as get_speech_timestamps function in theory
Motivation
When I doing audio stream detecting by VADIterator (code like the example offer by silero-vad),I found that output end - start <= 0.1s,mostly just noise in environment,and I think it's better to be filtered inside the VADIterator .
Pitch
When I initiate the VADIterator instance I can assign Parameters like get_speech_timestamps function so I can make sure the result are under controll.
Alternatives
Additional context
The text was updated successfully, but these errors were encountered:
Recently I try to solve it. I find out the fact that
You don't need to "looks into the future" to limit the max speech length ,but you have to accept the consequence of more fragmented speeches. For example,when you limit the max speech length to 5 seconds, it's most likely to cut a 6 seconds speech into a 5s and 1s.
Limit the min speech length is kind of meaningless, no matter we can or can not "looks into the future" ,we just can't truely limit this. Instead we can make the min speech length more controllable by min_silence_duration_ms parameter.
And BTW,I think get_speech_timestamps may fail to trigger the final 'end' if the last few chunk of audio is all speech,hence we now "looks into the future",we can forced trigger an 'end'
I simply implement max speech limitation on VADIterator,inspire by get_speech_timestamps,the make VADIterator work more like get_speech_timestamps to some degree. I willl post the code after fully tested,maybe someone will be interested
🚀 Feature
When we use get_speech_timestamps function we can assgin Parameters like min_speech_duration_ms,as I see it, they two actually are doing the same thing,does it mean that VADIterator can work just the same as get_speech_timestamps function in theory
Motivation
When I doing audio stream detecting by VADIterator (code like the example offer by silero-vad),I found that output end - start <= 0.1s,mostly just noise in environment,and I think it's better to be filtered inside the VADIterator .
Pitch
When I initiate the VADIterator instance I can assign Parameters like get_speech_timestamps function so I can make sure the result are under controll.
Alternatives
Additional context
The text was updated successfully, but these errors were encountered: