Bug: b3028 breaks mixtral 8x22b #7969

steampunque · 2024-06-17T05:04:18Z

What happened?

Mixtral 8x22b model running with server.

b3027: good
lm hi
Hello! How can I help you today? Is there something specific you would like to talk about or ask about? I'm here to provide information and answer questions to the best of my ability.

b3028: garbage
lm hi
👋

[INST] I'm here to help you with your questions about the [/INST] 🤓

[INST] I can provide information on a variety of topics, such as [/INST] 📚

[INST] - [/INST] 🏫

[/INST] 💻
[/INST] 📈
[/INST] 📊
[/INST] 📈
[/INST] 📊
[/INST] 📈
[/INST] 📊
[/INST] 📈

Name and Version

b3027 for good run
b3028 for broken run

What operating system are you seeing the problem on?

Linux

Relevant log output

na

nazarov-yuriy · 2024-06-17T05:27:41Z

Could you point the exact model file you used?

steampunque · 2024-06-17T12:53:39Z

Could you point the exact model file you used?

I tested on 3 different files:

https://huggingface.co/MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-GGUF/blob/main/Mixtral-8x22B-Instruct-v0.1.Q4_K_M-00001-of-00002.gguf

2), 3) My own quants made with latest version with model downloaded straight from mistralai,
https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1, testing IQ4_XS and Q4_K_S

b3028 tokenizer patch breaks all of them.

I plan to run IQ4_XS once bug is corrected as it appears to work better than Q4_K_S on other models and also is 2G smaller for this big model.

steampunque · 2024-06-18T03:09:13Z

I added b3028 revert patches to my github here: https://github.com/steampunque/llama.cpp-patches , covering time span of original patch to last update as of today (b3173).

I don't recommend applying the reverts unless you want to run this model as other wanted changes in newer versions may be erased.

Without patch, b3173:

lm hi

[INST]hi [/INST]

[INST]hi [/INST]

[INST]hi [/INST]

[INST]hi [/INST]

[INST]hi [/INST]

[INST]hi [/INST]

Apply revert_b3028_from_3173.patch :

lm hi
 Hello! How can I help you today? Is there something specific you would like to talk about or ask me a question about? I'm here to provide information and answer any questions you have to the best of my ability.

ggerganov · 2024-06-18T07:02:11Z

Post the llama-server and curl commands that you used, otherwise we can't help

steampunque · 2024-06-18T21:52:15Z

Post the llama-server and curl commands that you used, otherwise we can't help

b3181 with no revert. Use any of the model files I summarized earlier.

llama-cli -m /datahd/models/Mixtral-8x22B-Instruct-v0.1.IQ4_XS.gguf      --color -n -1 --multiline-input --interactive-first --log-disable    -ngl 7 -c 8192 -ctk f16 -ctv f16 -b 128 -fa -n 8192 --keep 0    --temp 0.0 --dynatemp-range 0.0 --dynatemp-exp 1.0    --top-k 40 --top-p 0.95 --typical 1.0 --min-p 0.00    --repeat-last-n 64 --repeat-penalty 1.0    --presence-penalty 0.0 --frequency-penalty 0.0    --tfs 1.0    --mirostat 0 --mirostat-lr 0.1 --mirostat-ent 5.0    -p "" --in-prefix "[INST] " --in-suffix " [/INST]"

Aborted conversation:

[INST] Hi 
 [/INST]

   [INST]  I'm here to help you with your questions about the 2022 Toyota Corolla.
   [/INST]

   [INST]  What can I assist you with today?
   [/INST]

   [USER]  What are the dimensions of the 2022 Toyota Corolla?
   [/USER]

   [INST][INST]

b3181 with b3028 reverted:

[INST] Hi 
 [/INST] Hello! How can I help you today? Is there something specific you would like to know or discuss? I'm here to answer any questions you have to the best of my ability.
[INST]

ggerganov · 2024-06-19T08:31:11Z

I don't have 8x22B handy, but with 8x7B I am not able to reproduce using that command:

make -j && ./llama-cli -m ./models/mixtral-instruct-8x7b-fast/ggml-model-q4_k.gguf --color -n -1 --multiline-input --interactive-first -ngl 7 -c 8192 -ctk f16 -ctv f16 -b 128 -fa -n 8192 --keep 0    --temp 0.0 --dynatemp-range 0.0 --dynatemp-exp 1.0    --top-k 40 --top-p 0.95 --typical 1.0 --min-p 0.00    --repeat-last-n 64 --repeat-penalty 1.0    --presence-penalty 0.0 --frequency-penalty 0.0    --tfs 1.0    --mirostat 0 --mirostat-lr 0.1 --mirostat-ent 5.0    -p "" --in-prefix "[INST] " --in-suffix " [/INST]" --verbose-prompt

[INST] Hi 
 [/INST] Hello! How can I help you today? If you have any questions about a particular topic or just want to chat, feel free to ask me anything. I'm here to provide information and engage in a friendly conversation.
[INST]

You can try to remove the whitespaces from the instruction suffix/prefix: --in-prefix "[INST]" --in-suffix "[/INST]" - these are incorrect

ggerganov · 2024-06-19T11:13:51Z

I downloaded the model and indeed there is a regression for the Mixtral 8x22B models. The [INST] and [/INST] tokens are no longer tokenized correctly after (#7500)

Before:

./tokenize -m Mixtral-8x22B-Instruct-v0.1.IQ4_XS-00001-of-00002.gguf -p "[INST]"

     3 -> '[INST]'

Now:

  1501 -> ' ['
 17057 -> 'INST'
 29561 -> ']'

cc @jaime-m-p

steampunque · 2024-06-19T13:24:45Z

I downloaded the model and indeed there is a regression for the Mixtral 8x22B models. The [INST] and [/INST] tokens are no longer tokenized correctly after (#7500)

Before:
./tokenize -m Mixtral-8x22B-Instruct-v0.1.IQ4_XS-00001-of-00002.gguf -p "[INST]"

     3 -> '[INST]'
Now:
  1501 -> ' ['
 17057 -> 'INST'
 29561 -> ']'
cc @jaime-m-p

Below appears to be the list of special tokens the model wants from tokenizer.json. I think mistral instruct v0.3 moved to special token for [INST] [/INST] also (same time the function stuff was added and vocabulary expanded) and it
works fine without reverting the patch. Looking at differences in the modes,
it appears mistral instruct v0.3 also defines the special tokens in tokenizer_config.json while mixtral 8x22b does not, so that may explain the problem, but the mystery remains as to why special tokens conversion ever worked with older builds.

{
"version": "1.0",
"truncation": null,
"padding": null,
"added_tokens": [
{
"id": 0,
"content": "",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 1,
"content": "",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 2,
"content": "",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 3,
"content": "[INST]",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 4,
"content": "[/INST]",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 5,
"content": "[TOOL_CALLS]",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 6,
"content": "[AVAILABLE_TOOLS]",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 7,
"content": "[/AVAILABLE_TOOLS]",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 8,
"content": "[TOOL_RESULTS]",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 9,
"content": "[/TOOL_RESULTS]",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
}
],

jaime-m-p · 2024-06-19T23:32:52Z

I'm trying to find the root of the problem.

Found some differences while loading special tokens:

dir_tokenizer = "'./models/tokenizers//mixtral8x22b'"
tokenizer = AutoTokenizer.from_pretrained(dir_tokenizer)
tokenizer.added_tokens_encoder
{'<unk>': 0, '<s>': 1, '</s>': 2, '[INST]': 3, '[/INST]': 4, '[TOOL_CALLS]': 5, '[AVAILABLE_TOOLS]': 6, '[/AVAILABLE_TOOLS]': 7, '[TOOL_RESULTS]': 8, '[/TOOL_RESULTS]': 9}

import gguf
dir_tokenizer = "'./models/tokenizers//mixtral8x22b'"
special_vocab = gguf.SpecialVocab(dir_tokenizer)
special_vocab.special_token_ids
{'bos': 1, 'eos': 2, 'unk': 0}

I have to look closer, but I'm confused in the method gguf.SpecialVocab._try_load_from_tokenizer_json().
It is loading both files tokenizer.json and tokenizer_config.json, then tryes to match tokens with the list special_token_types = ('bos', 'eos', 'unk', 'sep', 'pad', 'cls', 'mask'), doing an intersection instead of an union, lossing all new special tokens.

but the mystery remains as to why special tokens conversion ever worked with older builds.

Probably this: 938cb49.
Instead of doing a search of un-mergeable tokens (assuming they are special), now it trust the toktypes from the GGUF file.
This fixed some problems with other models, but I see it breaks others..

I think the correct option is try to fix gguf.SpecialVocab._try_load_from_tokenizer_json(), but this forces to regenerate the GGUF files. Not sure if this is aceptable, but the second option is to reintroduce the old behaviour and try to guess with models belong to each option.

steampunque · 2024-06-20T00:49:33Z

I think the correct option is try to fix gguf.SpecialVocab._try_load_from_tokenizer_json(), but this forces to regenerate the GGUF files. Not sure if this is aceptable, but the second option is to reintroduce the old behaviour and try to guess with models belong to each option.

In my opinion regen gguf should not be an issue. It has happened multiple times in past with BPE updates and will no doubt continue into future as models evolve with new tokenizers and new vocabularies and other needs. It would be good to output a warning messaging saying that the model should be updated if at all possible to detect the condition. Also might be possible to create a utility to just update metadata to avoid the pain of re converting and re quanting the larger models. This particular issue might wipe a lot of models leaning on special tokens adds though.

ggerganov · 2024-06-20T07:33:37Z

If the gguf.SpecialVocab._try_load_from_tokenizer_json() is the problem, we should fix it. We can require to regenerate such models. Would be nice to print out a warning for old GGUFs that have the problem, but I'm not sure if we will be able to detect outdated ones

Azirine · 2024-06-27T23:11:46Z

I reproduced this in main.
Input: hi

./main -m Mixtral-8x22B-Instruct-v0.1.i1-Q4_K_M.gguf --no-mmap -fa -ins --in-prefix "[INST] " --in-suffix "[/INST] " -s 0

b3027—no problems

> [INST] hi
[/INST] Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask. I'm here to help!

> [INST]

b3028-b3086—strange prefix/suffix tokens

> [INST] hi
[/INST]  [RESP]  Hello! How can I help you today? [/RESP] 

### Instruction:


> [INST]

./main -m Mixtral-8x22B-Instruct-v0.1.i1-Q4_K_M.gguf --no-mmap -fa -if --in-prefix "[INST] " --in-suffix "[/INST] " -s 0

b3087-b3140—random mention of IELTS

[INST] hi
[/INST]  hi there! How can I help you today? If you have any questions about the IELTS exam or need some advice on how to prepare for it, feel free to ask. I'm here to help you out.
[INST]

steampunque · 2024-06-28T12:37:13Z

I reproduced this in main. Input: hi

Its still broken as of b3233. It is necessary to revert b3027->b3028 update if you want to run the model.

Azirine · 2024-07-02T10:42:18Z

This also broke Yi-1.5-34b.

./main -m Yi-1.5-34B-Chat-16K-Q4_K_M.gguf --no-mmap -fa -cml -s 0 --special

Input: hi
b3027

<|im_start|> system
<|im_end|>
<|im_start|> user

> hi
Hello! How can I assist you today?<|im_end|>

>

b3028

 <|im_start|>system
<|im_end|>
> hi
Hello! How can I assist you today? <|im_end|>
<|im_start|>user

>

In b3028, <|im_end|> <|im_start|>user are generated character by character, showing that they are no longer treated as special tokens.

steampunque · 2024-07-06T03:57:06Z

Update. MistralAI updated the tokenizer configs for 8x22b a couple days ago so I ran a new convert/quant and all looks OK now:

SPECIAL=1 tokenize "[INST][/INST][TOOL_CALLS][AVAILABLE_TOOLS][/AVAILABLE_TOOLS][TOOL_RESULTS][/TOOL_RESULTS]"
[1,2,3,4,5,6,7,8,9]

lm Hello.
Hello! How can I help you today?

I tested with 3 version : b3266 with b3028 revert patch, b3266 with no revert patch, and latest b3324 with no revert patch and all were good on the new convert. So it looks like Mistral guys fixed the problem on their side, however it will require re converting from their new update as of a couple days ago.

I will leave this issue open for now since the bug breaks other models too, but mixtral 8x22b is now good due to the mistralai 8x22b update.

shibe2 · 2024-07-06T06:41:26Z

Will there be a proper support for special tokens such that the model would see the difference between, for example, special token [INST] and string "[INST]" occurring somewhere in the text?

ggerganov · 2024-07-06T06:57:42Z

There is already support - see the parse_special flag of llama_tokenize

shibe2 · 2024-07-06T07:10:13Z

Right. What I meant is support in examples.

ggerganov · 2024-07-06T07:25:48Z

There will be, but can't say when - does not seem very high prio IMO

github-actions · 2024-08-20T01:06:49Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

steampunque added bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels Jun 17, 2024

shibe2 mentioned this issue Jul 7, 2024

Bug: SPM tokenization breaks in at least one specific case. #7629

Closed

Azirine mentioned this issue Jul 20, 2024

Tokenizer broken for mixtral models LostRuins/koboldcpp#1009

Open

github-actions bot added the stale label Aug 6, 2024

github-actions bot closed this as completed Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: b3028 breaks mixtral 8x22b #7969

Bug: b3028 breaks mixtral 8x22b #7969

steampunque commented Jun 17, 2024

nazarov-yuriy commented Jun 17, 2024

steampunque commented Jun 17, 2024

steampunque commented Jun 18, 2024

ggerganov commented Jun 18, 2024

steampunque commented Jun 18, 2024 •

edited

Loading

ggerganov commented Jun 19, 2024

ggerganov commented Jun 19, 2024

steampunque commented Jun 19, 2024

jaime-m-p commented Jun 19, 2024

steampunque commented Jun 20, 2024

ggerganov commented Jun 20, 2024 •

edited

Loading

Azirine commented Jun 27, 2024

steampunque commented Jun 28, 2024

Azirine commented Jul 2, 2024

steampunque commented Jul 6, 2024

shibe2 commented Jul 6, 2024

ggerganov commented Jul 6, 2024

shibe2 commented Jul 6, 2024

ggerganov commented Jul 6, 2024

github-actions bot commented Aug 20, 2024

Bug: b3028 breaks mixtral 8x22b #7969

Bug: b3028 breaks mixtral 8x22b #7969

Comments

steampunque commented Jun 17, 2024

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

nazarov-yuriy commented Jun 17, 2024

steampunque commented Jun 17, 2024

steampunque commented Jun 18, 2024

ggerganov commented Jun 18, 2024

steampunque commented Jun 18, 2024 • edited Loading

ggerganov commented Jun 19, 2024

ggerganov commented Jun 19, 2024

steampunque commented Jun 19, 2024

jaime-m-p commented Jun 19, 2024

steampunque commented Jun 20, 2024

ggerganov commented Jun 20, 2024 • edited Loading

Azirine commented Jun 27, 2024

steampunque commented Jun 28, 2024

Azirine commented Jul 2, 2024

steampunque commented Jul 6, 2024

shibe2 commented Jul 6, 2024

ggerganov commented Jul 6, 2024

shibe2 commented Jul 6, 2024

ggerganov commented Jul 6, 2024

github-actions bot commented Aug 20, 2024

steampunque commented Jun 18, 2024 •

edited

Loading

ggerganov commented Jun 20, 2024 •

edited

Loading