-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llava : fix tokenization to not add bos after system prompt #3645
Conversation
genius discovery!! This should explain why it complained about not seeing any image. |
Yes, I expect this patch to fix this, but haven't tested yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with the llama.cpp logo and it's much better:
"The image features a white background with a large orange and black logo in the center. The logo is the name "LlamaC++" written in bold letters, likely representing a company or product related to programming or software development. The contrast between the white background and the vibrant colors of the logo makes it stand out prominently in the scene."
Previously it was mentioning about people gathering for an activity.
Thanks for the fix. Now we can expect a better performance in 7b as well I guess 🚀 |
I am trying to replicate the previous results with manually specifying the old |
@Green-Sky Try with the last commit. |
@monatis thanks, will also have to patch master though. |
I thought it was necessary every time the after the system prompt was evaluated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did some testing, but the original "i dont see an image" was already fixed, likely by the system prompt updates @monatis did.
The responses do look better now.
Yea, maybe, but before this pr it was clearly wrong, since it inserted |
@FSSRepo 13b in f16 (temperature = 0.2): " In the image, I see a hairless mole with long nails." |
I'm using the |
This fix is wonderful. Now it works fine even with vicuna-13b-v1.5-Q8_0.gguf, instead of the finetuned llava model. With the naked mole rat image above, it gives,
And with the DL image, and the json prompt,
|
how. @haotian-liu how much does the projection matrix and the finetuned clip vit contribute in comparison to the llm ? |
CLIP itself is not finetuned in LLaVA. I think the model @jxy is referring to is this model which undergoes further finetuning for insstruction following to become LLaVA V1.5 to my knowledge. Only the LLM part is finetuned, which is bridged to frozen CLIP layers via a two-layer MLP (multimodal projector) during pretraining. |
I used the actual vicuna v1.5, from https://huggingface.co/TheBloke/vicuna-13B-v1.5-GGUF/resolve/main/vicuna-13b-v1.5.Q8_0.gguf You may try it yourself. I also tried other 13B models, they all kind of works. |
Fascinating that the embedding projection matrix does so much by itself. |
@ggerganov Cool! Thanks for identifying and fixing this. The model now works much better. After the first-stage pretraining, LLaVA can actually answer some basic questions quite well, even following the format instructions (Table 6 in llava-1.5 paper). But there are some tasks that it is not yet good at doing, like reasoning about the relationship between the objects in the scene. For these tasks, it would probably require the finetuned LLM as well. |
* 'master' of github.com:ggerganov/llama.cpp: fix embeddings when using CUDA (ggml-org#3657) llama : avoid fprintf in favor of LLAMA_LOG (ggml-org#3538) readme : update hot-topics & models, detail windows release in usage (ggml-org#3615) CLBlast: Fix temporary buffer size for f16 conversion (wsize) train-text-from-scratch : fix assert failure in ggml-alloc (ggml-org#3618) editorconfig : remove trailing spaces server : documentation of JSON return value of /completion endpoint (ggml-org#3632) save-load-state : fix example + add ci test (ggml-org#3655) readme : add Aquila2 links (ggml-org#3610) tokenizer : special token handling (ggml-org#3538) k-quants : fix quantization ranges (ggml-org#3646) llava : fix tokenization to not add bos between image embeddings and user prompt (ggml-org#3645) MPT : support GQA for replit-code-v1.5 (ggml-org#3627) Honor -ngl option for Cuda offloading in llava (ggml-org#3621)
We had a bug in adding a BOS token unconditionally on every
eval_string
call:https://github.com/ggerganov/llama.cpp/blob/11bff290458f12f020b588792707f76ec658a27a/examples/llava/llava-utils.h#L52-L57
https://github.com/ggerganov/llama.cpp/blob/11bff290458f12f020b588792707f76ec658a27a/examples/llava/llava.cpp#L128-L132
This PR fixes that by adding BOS only for the system prompt:
https://github.com/ggerganov/llama.cpp/blob/e0fb74c6ee343f7a4e4a1ca349004c7bb2db99f0/examples/llava/llava.cpp#L129-L131
Some anecdotal testing shows that the performance with the 13B model is now much better: