Strange behavior of "stream" example (Linux, amd64) #354

kha84 · 2023-01-01T22:23:15Z

Hello there,

After doing some smoke tests of whisper.cpp utilizing ./main (all of that was working just perfectly with different language models) I moved to "stream" example - https://github.com/ggerganov/whisper.cpp/tree/master/examples/stream

The thing is, no matter what parameters I use (number of threads, different models, different step sizes/length), I cannot get it to recognize anything distant from the real-time speeds.

The closest I can get, is to use tiny.en model while keeping all the rest parameters unspecified, like this:

./stream -m ./models/ggml-tiny.en.bin

If I start adding any parameters to the above, or deviate from the tiny-en model, I start getting unpredictable results - garbage output, containing just a single word / few words, empty lines thrown in stdout over and over again, last displayed line being repeated over and over again.

One example - if I just add -vth 0.6 parameter to the above, I'm starting to get these lines:

whisper_full: failed to generate timestamp token - skipping one second

If I set "--step 0", as in the "Sliding window mode with VAD" example, it just fails with "Floating point exception (core dumped)"

$ ./stream -m ./models/ggml-tiny.en.bin --step 0 -vth 0.6
init: found 1 capture devices:
init:    - Capture device #0: 'BT600 Mono'
init: attempt to open default capture device ...
init: obtained spec for input device (SDL Id = 2):
init:     - sample rate:       16000
init:     - format:            33056 (required: 33056)
init:     - channels:          1 (required: 1)
init:     - samples per frame: 1024
whisper_model_load: loading model from './models/ggml-tiny.en.bin'
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 1
whisper_model_load: adding 1607 extra tokens
whisper_model_load: mem_required  =  390.00 MB
whisper_model_load: ggml ctx size =   73.58 MB
whisper_model_load: memory size   =   11.41 MB
whisper_model_load: model size    =   73.54 MB

main: processing 0 samples (step = 0.0 sec / len = 10.0 sec / keep = 0.0 sec), 4 threads, lang = en, task = transcribe, timestamps = 1 ...
Floating point exception (core dumped)

If I switch to any other bit heavier model, all allocated CPU threads are just maxed out 100% and printed results are almost garbage.

Ubuntu 22.10, AMD Ryzen 5 3400G (4 cores / 8 threads)

I appreciate any direction for the troubleshooting. I prob can profile the execution, to see where's the most CPU time is spent on, if that helps. I just cannot believe that my CPU cannot handle all that :)

The text was updated successfully, but these errors were encountered:

ggerganov · 2023-01-05T20:45:09Z

The Floating point exception (core dumped) is strange.
Try getting the latest master, then make clean + make stream and try again.

The larger models are quite heavy for real-time processing.
You can try for example using the base or small models, but increase the --step 5000.
Or even --step 10000 --length 20000 and see if it helps

kha84 · 2023-01-05T22:22:49Z

Sure, will try that out. Thanks a lot. I already stumbled on that other thread suggesting to set the step size at least twice more than encoding results from bench on my own hardware

benaclejames · 2023-01-07T23:22:55Z

Strange, I'm getting the same very slow transcription results on Windows too. Downloaded latest release and tried out the artifacts from the latest commit to run into the same slow and inaccurate transcriptions on both builds. Very weird...

ggerganov · 2023-01-15T05:42:06Z

There was a bug in the stream example: a6dbd91

I think this fixes both the garbage results + the floating point exception

meakbiyik · 2023-01-16T15:09:10Z

@ggerganov there seems to be a problem with stream for the last few weeks since the big overhaul which added VAD and high-pass filters. Despite disabling them, I still cannot find the culprit for this bug, so I have been using the version of this repo in 385236d. Just tried out with that fix, and sadly no improvements.

ggerganov · 2023-01-16T15:30:00Z

@meakbiyik
Thanks for reporting this.
I think I see what is the issue - here we incorrectly override the no_context parameter so the --keep_context argument does nothing because of this:

whisper.cpp/examples/stream/stream.cpp

Line 438 in 8738427

params.no_context = use_vad;

Let me know the exact command / parameters that you are using.
Btw, the VAD and high-pass filter are not used for --step > 0.
They are used only for the "sliding window" mode which is enabled by setting --step to 0

meakbiyik · 2023-01-16T15:34:14Z

@ggerganov not sure if this is the issue since I actually do not use "-kc" argument anyways, it hallucinates a bit too much :) But you are right, I set the "--step" argument so the issue is probably not VAD/high-pass filter.

meakbiyik · 2023-01-16T19:19:04Z

Here's a small update on my side: apparently there was an issue in my code that caused some absurd stutters. I resolved that, and now the master branch works perfectly - but I can still see clear difference in performance for low-quality sound between master and 385236d. I am now guessing that there were some optimizations in the matrix multiplications that reduced the robustness of the model somehow against noise. For all other purposes, everything works well :)

ggerganov · 2023-01-16T19:37:53Z

This is very likely related to the new temperature fallback strategy that is enabled by default.
For real-time streaming, it is recommended to disable it like this:

whisper.cpp/examples/stream/stream.cpp

Lines 617 to 620 in c9aeb33

    
           // disable temperature fallback 
        
           wparams.temperature_inc  = -1.0f;

meakbiyik · 2023-01-16T20:33:08Z

I suspect that the main issue is not the temperature (since I started experiencing it pretty much immediately after the above-referenced commit). My bet would be on either the loss of precision from 32-16 bit conversions, or some bug related to them, since this can directly cause issues with noise robustness (and possibly the overall quality of tiny models) without creating a problem for high-SNR data and bigger models.

)

meakbiyik · 2023-02-28T14:44:42Z

This is very likely related to the new temperature fallback strategy that is enabled by default. For real-time streaming, it is recommended to disable it like this:

whisper.cpp/examples/stream/stream.cpp

Lines 617 to 620 in c9aeb33

// disable temperature fallback

wparams.temperature_inc = -1.0f;

Hey @ggerganov! Now that the temperature is fixed, to stay as close as possible to the original whisper model, can we re-enable it in stream example as well? It would overall be ideal if we can update the stream parameters to align with the main example, as you have described here: #256 (comment). I can create a PR if you want.

ggerganov · 2023-02-28T19:15:29Z

The problem with the fallback is that when it triggers it increases the decoding time significantly.
I think for real-time purposes this is not desired.

)

ggerganov added the bug Something isn't working label Jan 15, 2023

ggerganov added a commit that referenced this issue Jan 16, 2023

stream : fix --keep_context argument to be used correctly (#354)

c9aeb33

meakbiyik mentioned this issue Jan 17, 2023

Regression in accuracy #419

Open

rock3125 pushed a commit to rock3125/whisper.cpp that referenced this issue Feb 21, 2023

stream : fix --keep_context argument to be used correctly (ggerganov#354

f263d72

)

anandijain pushed a commit to anandijain/whisper.cpp that referenced this issue Apr 28, 2023

stream : fix --keep_context argument to be used correctly (ggerganov#354

04ae193

)

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this issue Oct 24, 2023

stream : fix --keep_context argument to be used correctly (ggerganov#354

a8657af

)

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this issue Oct 24, 2023

stream : fix --keep_context argument to be used correctly (ggerganov#354

1f3ea25

)

landtanin pushed a commit to landtanin/whisper.cpp that referenced this issue Dec 16, 2023

stream : fix --keep_context argument to be used correctly (ggerganov#354

f822f20

)

iThalay pushed a commit to iThalay/whisper.cpp that referenced this issue Sep 23, 2024

stream : fix --keep_context argument to be used correctly (ggerganov#354

7c73b2f

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange behavior of "stream" example (Linux, amd64) #354

Strange behavior of "stream" example (Linux, amd64) #354

kha84 commented Jan 1, 2023

ggerganov commented Jan 5, 2023

kha84 commented Jan 5, 2023

benaclejames commented Jan 7, 2023

ggerganov commented Jan 15, 2023

meakbiyik commented Jan 16, 2023

ggerganov commented Jan 16, 2023

meakbiyik commented Jan 16, 2023

meakbiyik commented Jan 16, 2023 •

edited

Loading

ggerganov commented Jan 16, 2023

meakbiyik commented Jan 16, 2023

meakbiyik commented Feb 28, 2023 •

edited

Loading

ggerganov commented Feb 28, 2023

Strange behavior of "stream" example (Linux, amd64) #354

Strange behavior of "stream" example (Linux, amd64) #354

Comments

kha84 commented Jan 1, 2023

ggerganov commented Jan 5, 2023

kha84 commented Jan 5, 2023

benaclejames commented Jan 7, 2023

ggerganov commented Jan 15, 2023

meakbiyik commented Jan 16, 2023

ggerganov commented Jan 16, 2023

meakbiyik commented Jan 16, 2023

meakbiyik commented Jan 16, 2023 • edited Loading

ggerganov commented Jan 16, 2023

meakbiyik commented Jan 16, 2023

meakbiyik commented Feb 28, 2023 • edited Loading

ggerganov commented Feb 28, 2023

meakbiyik commented Jan 16, 2023 •

edited

Loading

meakbiyik commented Feb 28, 2023 •

edited

Loading