Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behavior of "stream" example (Linux, amd64) #354

Open
kha84 opened this issue Jan 1, 2023 · 12 comments
Open

Strange behavior of "stream" example (Linux, amd64) #354

kha84 opened this issue Jan 1, 2023 · 12 comments
Labels
bug Something isn't working

Comments

@kha84
Copy link

kha84 commented Jan 1, 2023

Hello there,

After doing some smoke tests of whisper.cpp utilizing ./main (all of that was working just perfectly with different language models) I moved to "stream" example - https://github.com/ggerganov/whisper.cpp/tree/master/examples/stream

The thing is, no matter what parameters I use (number of threads, different models, different step sizes/length), I cannot get it to recognize anything distant from the real-time speeds.

The closest I can get, is to use tiny.en model while keeping all the rest parameters unspecified, like this:

./stream -m ./models/ggml-tiny.en.bin

If I start adding any parameters to the above, or deviate from the tiny-en model, I start getting unpredictable results - garbage output, containing just a single word / few words, empty lines thrown in stdout over and over again, last displayed line being repeated over and over again.

One example - if I just add -vth 0.6 parameter to the above, I'm starting to get these lines:

whisper_full: failed to generate timestamp token - skipping one second

If I set "--step 0", as in the "Sliding window mode with VAD" example, it just fails with "Floating point exception (core dumped)"

$ ./stream -m ./models/ggml-tiny.en.bin --step 0 -vth 0.6
init: found 1 capture devices:
init:    - Capture device #0: 'BT600 Mono'
init: attempt to open default capture device ...
init: obtained spec for input device (SDL Id = 2):
init:     - sample rate:       16000
init:     - format:            33056 (required: 33056)
init:     - channels:          1 (required: 1)
init:     - samples per frame: 1024
whisper_model_load: loading model from './models/ggml-tiny.en.bin'
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 1
whisper_model_load: adding 1607 extra tokens
whisper_model_load: mem_required  =  390.00 MB
whisper_model_load: ggml ctx size =   73.58 MB
whisper_model_load: memory size   =   11.41 MB
whisper_model_load: model size    =   73.54 MB

main: processing 0 samples (step = 0.0 sec / len = 10.0 sec / keep = 0.0 sec), 4 threads, lang = en, task = transcribe, timestamps = 1 ...
Floating point exception (core dumped)

If I switch to any other bit heavier model, all allocated CPU threads are just maxed out 100% and printed results are almost garbage.

Ubuntu 22.10, AMD Ryzen 5 3400G (4 cores / 8 threads)

I appreciate any direction for the troubleshooting. I prob can profile the execution, to see where's the most CPU time is spent on, if that helps. I just cannot believe that my CPU cannot handle all that :)

@ggerganov
Copy link
Owner

The Floating point exception (core dumped) is strange.
Try getting the latest master, then make clean + make stream and try again.

The larger models are quite heavy for real-time processing.
You can try for example using the base or small models, but increase the --step 5000.
Or even --step 10000 --length 20000 and see if it helps

@kha84
Copy link
Author

kha84 commented Jan 5, 2023

Sure, will try that out. Thanks a lot. I already stumbled on that other thread suggesting to set the step size at least twice more than encoding results from bench on my own hardware

@benaclejames
Copy link

Strange, I'm getting the same very slow transcription results on Windows too. Downloaded latest release and tried out the artifacts from the latest commit to run into the same slow and inaccurate transcriptions on both builds. Very weird...

@ggerganov ggerganov added the bug Something isn't working label Jan 15, 2023
@ggerganov
Copy link
Owner

There was a bug in the stream example: a6dbd91

I think this fixes both the garbage results + the floating point exception

@meakbiyik
Copy link
Contributor

@ggerganov there seems to be a problem with stream for the last few weeks since the big overhaul which added VAD and high-pass filters. Despite disabling them, I still cannot find the culprit for this bug, so I have been using the version of this repo in 385236d. Just tried out with that fix, and sadly no improvements.

@ggerganov
Copy link
Owner

@meakbiyik
Thanks for reporting this.
I think I see what is the issue - here we incorrectly override the no_context parameter so the --keep_context argument does nothing because of this:

params.no_context = use_vad;

Let me know the exact command / parameters that you are using.
Btw, the VAD and high-pass filter are not used for --step > 0.
They are used only for the "sliding window" mode which is enabled by setting --step to 0

@meakbiyik
Copy link
Contributor

@ggerganov not sure if this is the issue since I actually do not use "-kc" argument anyways, it hallucinates a bit too much :) But you are right, I set the "--step" argument so the issue is probably not VAD/high-pass filter.

@meakbiyik
Copy link
Contributor

meakbiyik commented Jan 16, 2023

Here's a small update on my side: apparently there was an issue in my code that caused some absurd stutters. I resolved that, and now the master branch works perfectly - but I can still see clear difference in performance for low-quality sound between master and 385236d. I am now guessing that there were some optimizations in the matrix multiplications that reduced the robustness of the model somehow against noise. For all other purposes, everything works well :)

@ggerganov
Copy link
Owner

This is very likely related to the new temperature fallback strategy that is enabled by default.
For real-time streaming, it is recommended to disable it like this:

// disable temperature fallback
wparams.temperature_inc = -1.0f;

@meakbiyik
Copy link
Contributor

I suspect that the main issue is not the temperature (since I started experiencing it pretty much immediately after the above-referenced commit). My bet would be on either the loss of precision from 32-16 bit conversions, or some bug related to them, since this can directly cause issues with noise robustness (and possibly the overall quality of tiny models) without creating a problem for high-SNR data and bigger models.

@meakbiyik
Copy link
Contributor

meakbiyik commented Feb 28, 2023

This is very likely related to the new temperature fallback strategy that is enabled by default. For real-time streaming, it is recommended to disable it like this:

// disable temperature fallback
wparams.temperature_inc = -1.0f;

Hey @ggerganov! Now that the temperature is fixed, to stay as close as possible to the original whisper model, can we re-enable it in stream example as well? It would overall be ideal if we can update the stream parameters to align with the main example, as you have described here: #256 (comment). I can create a PR if you want.

@ggerganov
Copy link
Owner

The problem with the fallback is that when it triggers it increases the decoding time significantly.
I think for real-time purposes this is not desired.

anandijain pushed a commit to anandijain/whisper.cpp that referenced this issue Apr 28, 2023
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this issue Oct 24, 2023
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this issue Oct 24, 2023
landtanin pushed a commit to landtanin/whisper.cpp that referenced this issue Dec 16, 2023
iThalay pushed a commit to iThalay/whisper.cpp that referenced this issue Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants