You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I expect the performance to be the same over time when the model is answering my questions.
Current Behavior
The performance is good in the begining, answers are written out fast, 4 cpu cores are fully utilized. But over time speed degrades until it slows down to a word every 30 seconds and cpu cores are just idling.
Environment and Context
Apple M1 Mac Mini 16GB RAM. Ventura 13.3.
Python 3.8.13
GNU Make 3.81
Apple clang version 14.0.0 (clang-1400.0.29.202)
numpy 1.23.4
rotary-embedding-torch 0.2.1
sentencepiece 0.1.97
torch 2.1.0.dev20230307
torchaudio 2.0.0.dev20230307
torchvision 0.15.0.dev20230307
Steps to Reproduce
Ask questions for a while. The speed should degrade after about 10 questions that require longer answers.
The text was updated successfully, but these errors were encountered:
vashat
changed the title
[User] Insert summary of your issue or enhancement..
Performance degrading over time
Apr 7, 2023
Expected Behavior
When running this command:
./main -i --interactive-first -r "### Human:" --temp 0 -c 2048 -n -1 --ignore-eos --repeat_penalty 1.2 --threads 4 --instruct -m models/ggml-vicuna-13b-4bit.bin
I expect the performance to be the same over time when the model is answering my questions.
Current Behavior
The performance is good in the begining, answers are written out fast, 4 cpu cores are fully utilized. But over time speed degrades until it slows down to a word every 30 seconds and cpu cores are just idling.
Environment and Context
Apple M1 Mac Mini 16GB RAM. Ventura 13.3.
Steps to Reproduce
Ask questions for a while. The speed should degrade after about 10 questions that require longer answers.
The text was updated successfully, but these errors were encountered: