Error with special tokens tokenization #838

earzamastsev · 2023-10-23T04:09:57Z

I try using OpenAI-like API with vicuna LLM: python3 -m llama_cpp.server --n_gpu_layers 43 --model ./models/vicuna-13b-v1.5.Q8_0.gguf --port 8010 --host 0.0.0.0 --chat_format vicuna

Send request to endpoint /v1/chat/completions:
{
"max_tokens": 1024,
"temperature": 0.1,
"messages": [
{
"content": "Hello, what is your name?",
"role": "user"
},
{
"content": "My name is AI-asisstant",
"role": "assistant"
},
{
"content": "Can you repeat your name please?",
"role": "user"
}
]
}

And checking final prompt and final tokens. So, i see this:

PROMPT: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hello, what is your name? ASSISTANT: My name is AI-asisstant </s> USER: Can you repeat your name please? ASSISTANT:

PROMPT TOKENS: [1, 319, 13563, 1546, 263, 12758, 1404, 322, 385, 23116, 21082, 20255, 29889, 450, 20255, 4076, 8444, 29892, 13173, 29892, 322, 1248, 568, 6089, 304, 278, 1404, 29915, 29879, 5155, 29889, 3148, 1001, 29901, 15043, 29892, 825, 338, 596, 1024, 29973, 319, 1799, 9047, 13566, 29901, 1619, 1024, 338, 319, 29902, 29899, 25101, 303, 424, 829, 29879, 29958, 11889, 29901, 1815, 366, 12312, 596, 1024, 3113, 29973, 319, 1799, 9047, 13566, 29901]

I see that special token </s> didn't correctly converted to token id (should be token id = 2) but converted to [29879, 29958]. It is bug?

I saw similar issue in discussion llama.cpp github repo - ggml-org/llama.cpp#1812

The text was updated successfully, but these errors were encountered:

antoine-lizee · 2023-10-30T09:50:49Z

@abetlen I second the above.

I'm not sure where to share this given that it impacts a few different things. In particular, proper handling of special characters, especially </s> is key for any conversation application. Indeed, a few models (and the top ones: Llama2, Mistral, etc...) rely on reserved tokens all along the conversation - those are not "just" strings.

llama.cpp solved the problem only recently (in ggml-org/llama.cpp#3538), and it now works.

I think porting the change to llama-cpp-python will solve a few other problems that are shared around, eg: Chat Formats don't seem to work for instance ( #711 ).

Note: it might already be on your radar! Sorry if this is stating the obvious.

)

antoine-lizee · 2023-10-30T11:11:36Z

Ok, after checking out the latest version of the repo, we do have the right bindings (per this commit) but we don't have the correct default behaviour. As a result, llama-cpp-python behaves differently (and doesn't treat the special characters as it should).

I've opened PR #850 to propose a fix.

antoine-lizee pushed a commit to antoine-lizee/llama-cpp-python that referenced this issue Oct 30, 2023

ggml : update cblas_sgemm columns var to be more reasonable (abetlen#838

8c3ffc2

)

antoine-lizee mentioned this issue Oct 30, 2023

fix: tokenization of special characters: #850

Merged

abetlen closed this as completed Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with special tokens tokenization #838

Error with special tokens tokenization #838

earzamastsev commented Oct 23, 2023 •

edited

Loading

antoine-lizee commented Oct 30, 2023 •

edited

Loading

antoine-lizee commented Oct 30, 2023 •

edited

Loading

Error with special tokens tokenization #838

Error with special tokens tokenization #838

Comments

earzamastsev commented Oct 23, 2023 • edited Loading

antoine-lizee commented Oct 30, 2023 • edited Loading

antoine-lizee commented Oct 30, 2023 • edited Loading

earzamastsev commented Oct 23, 2023 •

edited

Loading

antoine-lizee commented Oct 30, 2023 •

edited

Loading

antoine-lizee commented Oct 30, 2023 •

edited

Loading