-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model gets stuck in some words #172
Comments
I believe this is a known limitation of the model - see this discussion for more info: There are various strategies that can be added to reduce the occurrence of this behaviour (i.e. beam search decoding, temperature fallbacks, VAD, etc.). Some of these are already available in the original implementation from OpenAI, so you can try running it and see if this resolves your issue. |
I've run into this issue as well, but see a difference between the output of Whisper (python) vs Whisper.cpp. While there are some repeated words in the python version of Whisper, there are pretty long sections where a phrase is repeated (up to 8 minutes or so) with Whisper.cpp. I wonder if there is anything that can be done to improve the behavior. Do you think maybe this difference is due to using beam search decoding or something similar in the original implementation? If so, I wonder how difficult it would be to implement that in c++? I've attached the output from both versions of whisper for comparison. I ran it on this podcast episode with the tiny model used for both runs. |
@szeidner |
I also keep having this problem, which is why I keep having to discard tasks, unfortunately. A workaround would be great. 👍 |
@ggerganov Thanks for looking into this! I do seem to run into this issue on most podcasts I've tried, so an implementation of |
I'm def having this issue as well. I'm having it with Having SAID that, the output of cpp is so much faster than whisper, it's worth it to try it on a show to see if it works and if it doesn't, restart or run in whisper - cos where it DOES work, it is so much faster on my M1 MBP 13" that it's worth the time. Thanks for the work, @ggerganov! I'll keep following (and updating my repo) to see if things get better. If you need a sample, please let me know). |
You can't say stuff like this and just expect someone is not gonna give the obvious reply - which as I am a programmer myself - I absolutely WILL NOT say... 😂 I respect all the work you do too much to do that! |
I'm seeing the same issue. For instance, I send 10 seconds of audio that has simply the number "six" repeated six times, and Whisper gets to work on it and takes a half minute to come back with 100 sixes. During the time it's cranking on it, the CPU is really loaded, which is not good. Issue #29 talks about silence gaps causing this behaviour, but saying "six" six times in 10 seconds is not a whole lot of silence. Maybe after the 3rd "six" it's the devil's number and this is hanging it up :) Also, the NULL pointer problem in issue #344 occurs often when it gets stuck in this loop. |
Last Whisper.cpp version On Mac M1
Model
ggml-medium.en.bin
Additional parameters:
-t 8 -ml 1
Mono audio file
It seems that the model gets stuck in some words and misses the actual conversation.
The text was updated successfully, but these errors were encountered: