llava : fix tokenization to not add bos after system prompt #3645

ggerganov · 2023-10-16T18:33:31Z

We had a bug in adding a BOS token unconditionally on every eval_string call:

https://github.com/ggerganov/llama.cpp/blob/11bff290458f12f020b588792707f76ec658a27a/examples/llava/llava-utils.h#L52-L57

https://github.com/ggerganov/llama.cpp/blob/11bff290458f12f020b588792707f76ec658a27a/examples/llava/llava.cpp#L128-L132

This PR fixes that by adding BOS only for the system prompt:

https://github.com/ggerganov/llama.cpp/blob/e0fb74c6ee343f7a4e4a1ca349004c7bb2db99f0/examples/llava/llava.cpp#L129-L131

Some anecdotal testing shows that the performance with the 13B model is now much better:

./llava -m ./models/llava-13b-v1.5/ggml-model-q4_k.gguf --mmproj ./models/llava-13b-v1.5/mmproj-model-f16.gguf --image ~/Downloads/274487006-f051e177-c7a1-49a1-a0c5-a0f0cad87b3a.jpg --temp 0.0 -p 'Please read the text in this image and return the information in the following JSON format (note xxx is placeholder, if the information is not available in the image, put "N/A" instead).\n{"class": xxx, "DLN": xxx, "DOB": xxx, "Name": xxx, "Address": xxx, "EXP": xxx, "ISS": xxx, "SEX": xxx, "HGT": xxx, "WGT": xxx, "EYES": xxx, "HAIR": xxx, "DONOR": xxx}' -e

# master

 {
"class": "California driver's license",
"DLN": "CA DL 123456789",
"DOB": "01/01/1990",
"Name": "John Doe",
"Address": "123 Main St, Anytown, CA 12345",
"EXP": "01/31/2024",
"ISS": "California Department of Motor Vehicles",
"SEX": "Male",
"HGT": "6'2\"",
"WGT": "190 lbs",
"EYES": "Brown",
"HAIR": "Brown",
"DONOR": "No"
}

# PR

{"class": "California", "DLN": "1123456789", "DOB": "08/23/1971", "Name": "Ima Cardholder", "Address": "1234 N Main Street, Anytown, CA 92678", "EXP": "08/23/2014", "ISS": "California", "SEX": "F", "HGT": "5' 3"", "WGT": "123 lbs", "EYES": "Brown", "HAIR": "Black", "DONOR": "Sex F"

Green-Sky · 2023-10-16T18:43:32Z

genius discovery!! This should explain why it complained about not seeing any image.

ggerganov · 2023-10-16T18:45:30Z

Yes, I expect this patch to fix this, but haven't tested yet.

monatis

Tested with the llama.cpp logo and it's much better:

"The image features a white background with a large orange and black logo in the center. The logo is the name "LlamaC++" written in bold letters, likely representing a company or product related to programming or software development. The contrast between the white background and the vibrant colors of the logo makes it stand out prominently in the scene."

Previously it was mentioning about people gathering for an activity.

monatis · 2023-10-16T18:55:08Z

Thanks for the fix. Now we can expect a better performance in 7b as well I guess 🚀

Green-Sky · 2023-10-16T18:59:12Z

I am trying to replicate the previous results with manually specifying the old 1234 seed, but the result is very different every time, so there might be a bug somewhere (not in this pr).

monatis · 2023-10-16T19:10:17Z

@Green-Sky Try with the last commit.

Green-Sky · 2023-10-16T19:17:37Z

@monatis thanks, will also have to patch master though.

FSSRepo · 2023-10-16T19:20:29Z

I thought it was necessary every time the after the system prompt was evaluated.

FSSRepo · 2023-10-16T19:35:23Z

Question: What animal do you see in the picture?

Model 7B:

I see a large, hairless animal in the picture.

I expected naked mole rat or something.

Green-Sky

did some testing, but the original "i dont see an image" was already fixed, likely by the system prompt updates @monatis did.
The responses do look better now.

Green-Sky · 2023-10-16T19:44:46Z

I thought it was necessary every time the after the system prompt was evaluated.

Yea, maybe, but before this pr it was clearly wrong, since it inserted BOS inbetween USER: and <user prompt here>

monatis · 2023-10-16T19:50:11Z

@FSSRepo 13b in f16 (temperature = 0.2):

" In the image, I see a hairless mole with long nails."
The hyperparameters Above are most comparable ones to the online demo to my best knowledge.

FSSRepo · 2023-10-16T19:52:32Z

@FSSRepo 13b in f16 (temperature = 0.2):

I'm using the ggml-model-7b-q4_k.gguf and temperature = 0.1

jxy · 2023-10-17T02:18:07Z

This fix is wonderful. Now it works fine even with vicuna-13b-v1.5-Q8_0.gguf, instead of the finetuned llava model.

With the naked mole rat image above, it gives,

The animal in the picture is a naked mole-rat, also known as a hairless mole. It is a small mammal with a long, pointed nose and a hairless body. The naked mole-rat is native to Africa and is known for its unique social behavior and ability to live in large colonies.

And with the DL image, and the json prompt,

{
"class": "DLN",
"DLN": "123456789",
"DOB": "01/01/1980",
"Name": "Iris Cardholder",
"Address": "123 Main St, Anytown, CA 123456",
"EXP": "08/15/2023",
"ISS": "CA",
"SEX": "F",
"HGT": "5'6\"",
"WGT": "125 lbs",
"EYES": "Brown",
"HAIR": "Black",
"DONOR": "N/A"
}

Green-Sky · 2023-10-17T12:55:00Z

Now it works fine even with vicuna-13b-v1.5-Q8_0.gguf, instead of the finetuned llava model.

how. @haotian-liu how much does the projection matrix and the finetuned clip vit contribute in comparison to the llm ?

monatis · 2023-10-17T13:06:27Z

CLIP itself is not finetuned in LLaVA. I think the model @jxy is referring to is this model which undergoes further finetuning for insstruction following to become LLaVA V1.5 to my knowledge.

Only the LLM part is finetuned, which is bridged to frozen CLIP layers via a two-layer MLP (multimodal projector) during pretraining.

jxy · 2023-10-17T13:51:07Z

CLIP itself is not finetuned in LLaVA. I think the model @jxy is referring to is this model which undergoes further finetuning for insstruction following to become LLaVA V1.5 to my knowledge.

I used the actual vicuna v1.5, from https://huggingface.co/TheBloke/vicuna-13B-v1.5-GGUF/resolve/main/vicuna-13b-v1.5.Q8_0.gguf

You may try it yourself. I also tried other 13B models, they all kind of works.

Green-Sky · 2023-10-17T14:46:32Z

Fascinating that the embedding projection matrix does so much by itself.

haotian-liu · 2023-10-17T19:42:48Z

@ggerganov Cool! Thanks for identifying and fixing this. The model now works much better.

@Green-Sky

After the first-stage pretraining, LLaVA can actually answer some basic questions quite well, even following the format instructions (Table 6 in llava-1.5 paper).

But there are some tasks that it is not yet good at doing, like reasoning about the relationship between the objects in the scene. For these tasks, it would probably require the finetuned LLM as well.

* 'master' of github.com:ggerganov/llama.cpp: fix embeddings when using CUDA (ggml-org#3657) llama : avoid fprintf in favor of LLAMA_LOG (ggml-org#3538) readme : update hot-topics & models, detail windows release in usage (ggml-org#3615) CLBlast: Fix temporary buffer size for f16 conversion (wsize) train-text-from-scratch : fix assert failure in ggml-alloc (ggml-org#3618) editorconfig : remove trailing spaces server : documentation of JSON return value of /completion endpoint (ggml-org#3632) save-load-state : fix example + add ci test (ggml-org#3655) readme : add Aquila2 links (ggml-org#3610) tokenizer : special token handling (ggml-org#3538) k-quants : fix quantization ranges (ggml-org#3646) llava : fix tokenization to not add bos between image embeddings and user prompt (ggml-org#3645) MPT : support GQA for replit-code-v1.5 (ggml-org#3627) Honor -ngl option for Cuda offloading in llava (ggml-org#3621)

llava : fix tokenization to not add bos after system prompt

e0fb74c

ggerganov requested review from monatis and Green-Sky October 16, 2023 18:33

monatis approved these changes Oct 16, 2023

View reviewed changes

set seed

20131fe

Green-Sky approved these changes Oct 16, 2023

View reviewed changes

monatis merged commit 940efa9 into master Oct 16, 2023

monatis deleted the fix-llava branch October 17, 2023 13:09

haotian-liu mentioned this pull request Oct 17, 2023

GGUF/GPTQ models for LLaVA-1.5 haotian-liu/LLaVA#526

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llava : fix tokenization to not add bos after system prompt #3645

llava : fix tokenization to not add bos after system prompt #3645

ggerganov commented Oct 16, 2023 •

edited

Loading

Green-Sky commented Oct 16, 2023

ggerganov commented Oct 16, 2023

monatis left a comment

monatis commented Oct 16, 2023

Green-Sky commented Oct 16, 2023

monatis commented Oct 16, 2023

Green-Sky commented Oct 16, 2023

FSSRepo commented Oct 16, 2023 •

edited

Loading

FSSRepo commented Oct 16, 2023

Green-Sky left a comment

Green-Sky commented Oct 16, 2023 •

edited

Loading

monatis commented Oct 16, 2023 •

edited

Loading

FSSRepo commented Oct 16, 2023 •

edited

Loading

jxy commented Oct 17, 2023

Green-Sky commented Oct 17, 2023

monatis commented Oct 17, 2023 •

edited

Loading

jxy commented Oct 17, 2023

Green-Sky commented Oct 17, 2023

haotian-liu commented Oct 17, 2023

llava : fix tokenization to not add bos after system prompt #3645

llava : fix tokenization to not add bos after system prompt #3645

Conversation

ggerganov commented Oct 16, 2023 • edited Loading

Green-Sky commented Oct 16, 2023

ggerganov commented Oct 16, 2023

monatis left a comment

Choose a reason for hiding this comment

monatis commented Oct 16, 2023

Green-Sky commented Oct 16, 2023

monatis commented Oct 16, 2023

Green-Sky commented Oct 16, 2023

FSSRepo commented Oct 16, 2023 • edited Loading

FSSRepo commented Oct 16, 2023

Green-Sky left a comment

Choose a reason for hiding this comment

Green-Sky commented Oct 16, 2023 • edited Loading

monatis commented Oct 16, 2023 • edited Loading

FSSRepo commented Oct 16, 2023 • edited Loading

jxy commented Oct 17, 2023

Green-Sky commented Oct 17, 2023

monatis commented Oct 17, 2023 • edited Loading

jxy commented Oct 17, 2023

Green-Sky commented Oct 17, 2023

haotian-liu commented Oct 17, 2023

ggerganov commented Oct 16, 2023 •

edited

Loading

FSSRepo commented Oct 16, 2023 •

edited

Loading

Green-Sky commented Oct 16, 2023 •

edited

Loading

monatis commented Oct 16, 2023 •

edited

Loading

FSSRepo commented Oct 16, 2023 •

edited

Loading

monatis commented Oct 17, 2023 •

edited

Loading