You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm transcribing a relatively long video and I'm often get a bunch of duplicated words for the same timestamp, e.g.:
talk about the frameworks that entrepreneurs can use to think about how the ad value, and how it's the balance balance balance balance balance balance balance balance balance balance balance balance. what most entrepreneurs do wrong.
I use distil-large-v2 model with faster-whisper standalone executable. Here are the arguments I'm passing into faster-whisper.
I saw a relevant discussion, but it proposed a fix already, which did not fix the issue for me: #716
I made sure I'm on the latest version as of today. I also tried playing around with beam_size setting, but no effect, just slower transcription. I need the one_word setting, though it might be causing the issue, but haven't tested yet (might test it later). The video I'm testing with is this one: https://www.youtube.com/watch?v=q3xN1iYeTNI (downloaded with youtube-dl)
The text was updated successfully, but these errors were encountered:
Then you are posting in the wrong repo.
Try standard model, medium or large-v2, or --hallucination_silence_threshold 2.
Imo, the distil models are not good for the long form transcriptions.
I need the one_word setting, though it might be causing the issue
It can't cause any issue as it's just srt/vtt writing setting and it has no effect in your example as output there is json.
I'm transcribing a relatively long video and I'm often get a bunch of duplicated words for the same timestamp, e.g.:
I use
distil-large-v2
model with faster-whisper standalone executable. Here are the arguments I'm passing into faster-whisper.I saw a relevant discussion, but it proposed a fix already, which did not fix the issue for me: #716
I made sure I'm on the latest version as of today. I also tried playing around with beam_size setting, but no effect, just slower transcription. I need the one_word setting, though it might be causing the issue, but haven't tested yet (might test it later). The video I'm testing with is this one: https://www.youtube.com/watch?v=q3xN1iYeTNI (downloaded with youtube-dl)
The text was updated successfully, but these errors were encountered: