Model gets stuck in some words #172

CarlitosDev · 2022-11-23T12:10:48Z

Last Whisper.cpp version On Mac M1
Model ggml-medium.en.bin
Additional parameters: -t 8 -ml 1
Mono audio file

It seems that the model gets stuck in some words and misses the actual conversation.

The text was updated successfully, but these errors were encountered:

ggerganov · 2022-11-23T20:25:41Z

I believe this is a known limitation of the model - see this discussion for more info:

There are various strategies that can be added to reduce the occurrence of this behaviour (i.e. beam search decoding, temperature fallbacks, VAD, etc.). Some of these are already available in the original implementation from OpenAI, so you can try running it and see if this resolves your issue.

szeidner · 2022-12-09T15:08:09Z

I've run into this issue as well, but see a difference between the output of Whisper (python) vs Whisper.cpp. While there are some repeated words in the python version of Whisper, there are pretty long sections where a phrase is repeated (up to 8 minutes or so) with Whisper.cpp. I wonder if there is anything that can be done to improve the behavior. Do you think maybe this difference is due to using beam search decoding or something similar in the original implementation? If so, I wonder how difficult it would be to implement that in c++?

I've attached the output from both versions of whisper for comparison. I ran it on this podcast episode with the tiny model used for both runs.

whisper.python.txt
whisper.cpp.txt

ggerganov · 2022-12-16T16:38:57Z

@szeidner
Yes, it's likely due to the inferior decoding strategy in whisper.cpp.
I've made some improvements lately - you might give it another try, but probably your case is still going to fail.
I think we need the temperature feature from the OpenAI decoding method to fix this.
Implementation is not very difficult, but I keep prioritising other stuff.

geimist · 2022-12-16T16:44:07Z

I also keep having this problem, which is why I keep having to discard tasks, unfortunately. A workaround would be great. 👍

szeidner · 2022-12-16T18:30:45Z

@ggerganov Thanks for looking into this! I do seem to run into this issue on most podcasts I've tried, so an implementation of temperature as a potential fix would be awesome. Thank you!

janngobble · 2022-12-20T11:45:42Z

I'm def having this issue as well. I'm having it with -l it (I'm transcoding Italian then using an external engine to translate to EN - colloquialisms are so hard to deal with in some translators and this is a detective TV series "Murders at Barlume"), but it still gets stuck for ~1-15 minutes on one random phrase. (audio format PCM/WAV, 1 channel, 16 bits, ~1 hr 30 min long)

Having SAID that, the output of cpp is so much faster than whisper, it's worth it to try it on a show to see if it works and if it doesn't, restart or run in whisper - cos where it DOES work, it is so much faster on my M1 MBP 13" that it's worth the time.

Thanks for the work, @ggerganov! I'll keep following (and updating my repo) to see if things get better. If you need a sample, please let me know).

janngobble · 2022-12-20T19:25:06Z

I think we need the temperature feature from the OpenAI decoding method to fix this.
Implementation is not very difficult, but I keep prioritising other stuff.

You can't say stuff like this and just expect someone is not gonna give the obvious reply - which as I am a programmer myself - I absolutely WILL NOT say... 😂

I respect all the work you do too much to do that!

RndyP · 2022-12-30T20:34:37Z

I'm seeing the same issue. For instance, I send 10 seconds of audio that has simply the number "six" repeated six times, and Whisper gets to work on it and takes a half minute to come back with 100 sixes. During the time it's cranking on it, the CPU is really loaded, which is not good.

Issue #29 talks about silence gaps causing this behaviour, but saying "six" six times in 10 seconds is not a whole lot of silence. Maybe after the 3rd "six" it's the devil's number and this is hanging it up :)

Also, the NULL pointer problem in issue #344 occurs often when it gets stuck in this loop.

CarlitosDev changed the title ~~Model get _stuck_ in some words~~ Model gets _stuck_ in some words Nov 23, 2022

CarlitosDev changed the title ~~Model gets _stuck_ in some words~~ Model gets stuck in some words Nov 23, 2022

ggerganov closed this as completed Dec 4, 2022

szeidner mentioned this issue Dec 9, 2022

Last segment is repeated in output after the end of input #244

Closed

ggerganov reopened this Dec 16, 2022

ggerganov added the enhancement New feature or request label Dec 16, 2022

ggerganov mentioned this issue Dec 18, 2022

Improve decoding #291

Merged

ggerganov linked a pull request Jan 8, 2023 that will close this issue

Improve decoding #291

Merged

ggerganov closed this as completed in #291 Jan 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model gets stuck in some words #172

Model gets stuck in some words #172

CarlitosDev commented Nov 23, 2022

ggerganov commented Nov 23, 2022

szeidner commented Dec 9, 2022

ggerganov commented Dec 16, 2022

geimist commented Dec 16, 2022

szeidner commented Dec 16, 2022

janngobble commented Dec 20, 2022

janngobble commented Dec 20, 2022

RndyP commented Dec 30, 2022 •

edited

Loading

Model gets stuck in some words #172

Model gets stuck in some words #172

Comments

CarlitosDev commented Nov 23, 2022

ggerganov commented Nov 23, 2022

szeidner commented Dec 9, 2022

ggerganov commented Dec 16, 2022

geimist commented Dec 16, 2022

szeidner commented Dec 16, 2022

janngobble commented Dec 20, 2022

janngobble commented Dec 20, 2022

RndyP commented Dec 30, 2022 • edited Loading

RndyP commented Dec 30, 2022 •

edited

Loading