Add `mistral-v1`, `mistral-v3`, `mistral-v3-tekken` and `mistral-v7` chat template types #10572

jukofyork · 2024-11-29T01:04:19Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

This PR adds the mistral-v1, mistral-v2, mistral-v3 and mistral-v3-tekken chat template types based on the "Demystifying Mistral's Instruct Tokenization & Chat Templates" documentation provided by MistralAI themselves:

https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md
https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/templates.md

1. I have deliberately ignored the "prepend it to the last user message" part for the v2/v3 system prompts:

<s>[INST] user message[/INST] assistant message</s>[INST] system prompt

new user message[/INST]

as this will constantly invalidate the KV-cache and from their own wording; I'm not convinced it has even been trained like this nor expects it there:

This is how mistral_common and the templates implement system prompts, but this can easily be customized. Feel free to use system prompts in different places, such as the second from the last or simply as the first user message, as before.

2. I can't really add any specific `templates`/`expected_output` tests:

I've left the old logic intact as a fall-through:

    } else if (tmpl == "llama2" || tmpl == "mistral" || tmpl_contains("[INST]")) {
        // llama2 template and its variants
        // [variant] support system message
        bool support_system_message = tmpl_contains("<<SYS>>") || tmpl == "mistral";
        // [variant] space before + after response
        bool space_around_response = tmpl_contains("' ' + eos_token");
        // [variant] add BOS inside history
        bool add_bos_inside_history = tmpl_contains("bos_token + '[INST]");
        // [variant] trim spaces from the input message
        bool strip_message = tmpl_contains("content.strip()");
        // construct the prompt
        bool is_inside_turn = true; // skip BOS at the beginning
        ss << "[INST] ";
        for (auto message : chat) {
            std::string content = strip_message ? trim(message->content) : message->content;
            std::string role(message->role);
            if (!is_inside_turn) {
                is_inside_turn = true;
                ss << (add_bos_inside_history ? "<s>[INST] " : "[INST] ");
            }
            if (role == "system") {
                if (support_system_message) {
                    ss << "<<SYS>>\n" << content << "\n<</SYS>>\n\n";
                } else {
                    // if the model does not support system message, we still include it in the first message, but without <<SYS>>
                    ss << content << "\n";
                }
            } else if (role == "user") {
                ss << content << " [/INST]";
            } else {
                ss << (space_around_response ? " " : "") << content << (space_around_response ? " " : "") << "</s>";
                is_inside_turn = false;
            }
        }
        // llama2 templates seem to not care about "add_generation_prompt"

as this looks way too brittle to even try to touch and could end up affecting lots of existing models that use slight variations of the [INST] tags.

I did see one bug in the old code though:

bool support_system_message = tmpl_contains("<<SYS>>") || tmpl == "mistral";
.
.
.
if (support_system_message) {
    ss << "<<SYS>>\n" << content << "\n<</SYS>>\n\n";
}

No mistral model has every used <<SYS>> tags AFAIK (there are purely for llama2 and codellama only - see here).

There is also a lot of confusion with MistralAI's own repos having wrong templates posted by themselves at one time or another, mistral-community keeping up the old/wrong templates, and so on... So there isn't really much point in trying to do the "2. Use the following python script to generate a test conversation" part, as this is an attempt to finally "Demystify" the prompts from their own documentation, rather than go by whatever broken Jinja2 templates are floating around...

So instead I just added the mistral-v1, mistral-v2, mistral-v3 and mistral-v3-tekken options for the user to specify manually:

Open Model	Tokenizer
Mistral 7B Instruct v0.1	v1
Mistral 7B Instruct v0.2	v1
Mistral 7B Instruct v0.3	v3
Mixtral 8x7B Instruct v0.1	v1
Mixtral 8x22B Instruct v0.1	v3
Mixtral 8x22B Instruct v0.3	v3
Codestral 22B v0.1	v3
Codestral Mamba 7B v0.1	v3
Mathstral 7B v0.1	v3
Nemo 12B 2407	v3 - Tekken
Large 123B 2407	v3

Endpoint Model	Tokenizer
mistral-embed	v1
open-mistral-7b	v3
open-mixtral-8x7b	v1
open-mixtral-8x22b	v3
mistral-small-latest	v2
mistral-large-latest	v3
codestral-22b	v3
open-codestral-mamba	v3
open-mistral-nemo	v3 - Tekken

but if unspecified; let the old logic of the fall-through code get used instead.

I have added to the fmt_sys and fmt_single tests at the bottom of test-chat-template.cpp so will check back tomorrow to see if they passed or not.

3. If accepted then somebody with wiki access might want to update this page:

https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template

as it doesn't seem possible to edit via a PR?

jukofyork · 2024-11-29T01:20:31Z

LOL, they've added a new (undocumented) v7 system-prompt now for the new mistralai/Mistral-Large-Instruct-2411:

Basic Instruct Template (V7)
<s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT][INST] <user message>[/INST] <assistant response>> </s>[INST] <user message>[/INST]
Be careful with subtle missing or trailing white spaces!

I'll try and add this tomorrow too if I get chance (what happened to v4, v5 and v6??? 😁).

jukofyork · 2024-11-29T13:52:08Z

So I've now:

1. Refactored `if (tmpl == "llama2" || tmpl.find("mistral") == 0 || tmpl_contains("[INST]"))` case:

Uses tmpl_contains("[SYSTEM_PROMPT]") to auto-detect the v7 template used by the new mistralai/Mistral-Large-Instruct-2411 model.
Uses tmpl_contains("' [INST] ' + system_message") to auto-detect the official v1 template models.
Uses tmpl_contains("[AVAILABLE_TOOLS]") to auto-detect the official v3 and v3-tekken template models.

2. Fixed a bug in the old fall-through case:

bool support_system_message = tmpl_contains("<<SYS>>") || tmpl == "mistral";

which should have been:

bool support_system_message = tmpl_contains("<<SYS>>") || tmpl == "llama2";

as before it has the exact opposite logic (ie: llama2 models got no <<SYS>>" tags and mistral models did; see here).

I've otherwise left this code untouched for fear of breaking some old "mistral-esque" models people might still use...

I have added tests at the bottom of test-chat-template.cpp for these cases now though:

assert(fmt_sys("llama2") == "[INST] <<SYS>>\nYou are a helpful assistant\n<</SYS>>\n\n");
assert(fmt_sys("mistral") == "[INST] You are a helpful assistant\n"); // for old pre-v1 templates
.
.
.
assert(fmt_single("llama2") == "[INST] How are you [/INST]");
assert(fmt_single("mistral") == "[INST] How are you [/INST]"); // for old pre-v1 templates

3. Added tests to `test-chat-template.cpp`:

// mistralai/Mistral-7B-Instruct-v0.2 (NOTE: Old pre-v1 without a system prompt)
"{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
.
.
.
// mistralai/Mistral-7B-Instruct-v0.2 (mistralai 'v1' template with a system prompt)
"{%- if messages[0]['role'] == 'system' %}\n    {%- set system_message = messages[0]['content'] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n\n{{- bos_token }}\n{%- for message in loop_messages %}\n    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}\n        {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}\n    {%- endif %}\n    {%- if message['role'] == 'user' %}\n        {%- if loop.first and system_message is defined %}\n            {{- ' [INST] ' + system_message + '\\n\\n' + message['content'] + ' [/INST]' }}\n        {%- else %}\n            {{- ' [INST] ' + message['content'] + ' [/INST]' }}\n        {%- endif %}\n    {%- elif message['role'] == 'assistant' %}\n        {{- ' ' + message['content'] + eos_token}}\n    {%- else %}\n        {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') }}\n    {%- endif %}\n{%- endfor %}\n",
// Mistral-Large-Instruct-2407 (mistralai 'v2'/'v3' template)
"{%- if messages[0][\"role\"] == \"system\" %}\n    {%- set system_message = messages[0][\"content\"] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n{%- if not tools is defined %}\n    {%- set tools = none %}\n{%- endif %}\n{%- set user_messages = loop_messages | selectattr(\"role\", \"equalto\", \"user\") | list %}\n\n{#- This block checks for alternating user/assistant messages, skipping tool calling messages #}\n{%- set ns = namespace() %}\n{%- set ns.index = 0 %}\n{%- for message in loop_messages %}\n    {%- if not (message.role == \"tool\" or message.role == \"tool_results\" or (message.tool_calls is defined and message.tool_calls is not none)) %}\n        {%- if (message[\"role\"] == \"user\") != (ns.index % 2 == 0) %}\n            {{- raise_exception(\"After the optional system message, conversation roles must alternate user/assistant/user/assistant/...\") }}\n        {%- endif %}\n        {%- set ns.index = ns.index + 1 %}\n    {%- endif %}\n{%- endfor %}\n\n{{- bos_token }}\n{%- for message in loop_messages %}\n    {%- if message[\"role\"] == \"user\" %}\n        {%- if tools is not none and (message == user_messages[-1]) %}\n            {{- \"[AVAILABLE_TOOLS] [\" }}\n            {%- for tool in tools %}\n                {%- set tool = tool.function %}\n                {{- '{\"type\": \"function\", \"function\": {' }}\n                {%- for key, val in tool.items() if key != \"return\" %}\n                    {%- if val is string %}\n                        {{- '\"' + key + '\": \"' + val + '\"' }}\n                    {%- else %}\n                        {{- '\"' + key + '\": ' + val|tojson }}\n                    {%- endif %}\n                    {%- if not loop.last %}\n                        {{- \", \" }}\n                    {%- endif %}\n                {%- endfor %}\n                {{- \"}}\" }}\n                {%- if not loop.last %}\n                    {{- \", \" }}\n                {%- else %}\n                    {{- \"]\" }}\n                {%- endif %}\n            {%- endfor %}\n            {{- \"[/AVAILABLE_TOOLS]\" }}\n            {%- endif %}\n        {%- if loop.last and system_message is defined %}\n            {{- \"[INST] \" + system_message + \"\\n\\n\" + message[\"content\"] + \"[/INST]\" }}\n        {%- else %}\n            {{- \"[INST] \" + message[\"content\"] + \"[/INST]\" }}\n        {%- endif %}\n    {%- elif message.tool_calls is defined and message.tool_calls is not none %}\n        {{- \"[TOOL_CALLS] [\" }}\n        {%- for tool_call in message.tool_calls %}\n            {%- set out = tool_call.function|tojson %}\n            {{- out[:-1] }}\n            {%- if not tool_call.id is defined or tool_call.id|length != 9 %}\n                {{- raise_exception(\"Tool call IDs should be alphanumeric strings with length 9!\") }}\n            {%- endif %}\n            {{- ', \"id\": \"' + tool_call.id + '\"}' }}\n            {%- if not loop.last %}\n                {{- \", \" }}\n            {%- else %}\n                {{- \"]\" + eos_token }}\n            {%- endif %}\n        {%- endfor %}\n    {%- elif message[\"role\"] == \"assistant\" %}\n        {{- \" \" + message[\"content\"]|trim + eos_token}}\n    {%- elif message[\"role\"] == \"tool_results\" or message[\"role\"] == \"tool\" %}\n        {%- if message.content is defined and message.content.content is defined %}\n            {%- set content = message.content.content %}\n        {%- else %}\n            {%- set content = message.content %}\n        {%- endif %}\n        {{- '[TOOL_RESULTS] {\"content\": ' + content|string + \", \" }}\n        {%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}\n            {{- raise_exception(\"Tool call IDs should be alphanumeric strings with length 9!\") }}\n        {%- endif %}\n        {{- '\"call_id\": \"' + message.tool_call_id + '\"}[/TOOL_RESULTS]' }}\n    {%- else %}\n        {{- raise_exception(\"Only user and assistant roles are supported, with the exception of an initial optional system message!\") }}\n    {%- endif %}\n{%- endfor %}\n",
// Mistral-Nemo-Instruct-2407 (mistralai 'v3-tekken' template)
"{%- if messages[0][\"role\"] == \"system\" %}\n    {%- set system_message = messages[0][\"content\"] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n{%- if not tools is defined %}\n    {%- set tools = none %}\n{%- endif %}\n{%- set user_messages = loop_messages | selectattr(\"role\", \"equalto\", \"user\") | list %}\n\n{#- This block checks for alternating user/assistant messages, skipping tool calling messages #}\n{%- set ns = namespace() %}\n{%- set ns.index = 0 %}\n{%- for message in loop_messages %}\n    {%- if not (message.role == \"tool\" or message.role == \"tool_results\" or (message.tool_calls is defined and message.tool_calls is not none)) %}\n        {%- if (message[\"role\"] == \"user\") != (ns.index % 2 == 0) %}\n            {{- raise_exception(\"After the optional system message, conversation roles must alternate user/assistant/user/assistant/...\") }}\n        {%- endif %}\n        {%- set ns.index = ns.index + 1 %}\n    {%- endif %}\n{%- endfor %}\n\n{{- bos_token }}\n{%- for message in loop_messages %}\n    {%- if message[\"role\"] == \"user\" %}\n        {%- if tools is not none and (message == user_messages[-1]) %}\n            {{- \"[AVAILABLE_TOOLS][\" }}\n            {%- for tool in tools %}\n                {%- set tool = tool.function %}\n                {{- '{\"type\": \"function\", \"function\": {' }}\n                {%- for key, val in tool.items() if key != \"return\" %}\n                    {%- if val is string %}\n                        {{- '\"' + key + '\": \"' + val + '\"' }}\n                    {%- else %}\n                        {{- '\"' + key + '\": ' + val|tojson }}\n                    {%- endif %}\n                    {%- if not loop.last %}\n                        {{- \", \" }}\n                    {%- endif %}\n                {%- endfor %}\n                {{- \"}}\" }}\n                {%- if not loop.last %}\n                    {{- \", \" }}\n                {%- else %}\n                    {{- \"]\" }}\n                {%- endif %}\n            {%- endfor %}\n            {{- \"[/AVAILABLE_TOOLS]\" }}\n            {%- endif %}\n        {%- if loop.last and system_message is defined %}\n            {{- \"[INST]\" + system_message + \"\\n\\n\" + message[\"content\"] + \"[/INST]\" }}\n        {%- else %}\n            {{- \"[INST]\" + message[\"content\"] + \"[/INST]\" }}\n        {%- endif %}\n    {%- elif (message.tool_calls is defined and message.tool_calls is not none) %}\n        {{- \"[TOOL_CALLS][\" }}\n        {%- for tool_call in message.tool_calls %}\n            {%- set out = tool_call.function|tojson %}\n            {{- out[:-1] }}\n            {%- if not tool_call.id is defined or tool_call.id|length != 9 %}\n                {{- raise_exception(\"Tool call IDs should be alphanumeric strings with length 9!\") }}\n            {%- endif %}\n            {{- ', \"id\": \"' + tool_call.id + '\"}' }}\n            {%- if not loop.last %}\n                {{- \", \" }}\n            {%- else %}\n                {{- \"]\" + eos_token }}\n            {%- endif %}\n        {%- endfor %}\n    {%- elif message[\"role\"] == \"assistant\" %}\n        {{- message[\"content\"] + eos_token}}\n    {%- elif message[\"role\"] == \"tool_results\" or message[\"role\"] == \"tool\" %}\n        {%- if message.content is defined and message.content.content is defined %}\n            {%- set content = message.content.content %}\n        {%- else %}\n            {%- set content = message.content %}\n        {%- endif %}\n        {{- '[TOOL_RESULTS]{\"content\": ' + content|string + \", \" }}\n        {%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}\n            {{- raise_exception(\"Tool call IDs should be alphanumeric strings with length 9!\") }}\n        {%- endif %}\n        {{- '\"call_id\": \"' + message.tool_call_id + '\"}[/TOOL_RESULTS]' }}\n    {%- else %}\n        {{- raise_exception(\"Only user and assistant roles are supported, with the exception of an initial optional system message!\") }}\n    {%- endif %}\n{%- endfor %}\n",
// mistralai/Mistral-Large-Instruct-2411 (mistralai 'v7' template)
"{{ bos_token }}{% for message in messages %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + '[/INST]' }}{% elif message['role'] == 'system' %}{{ '[SYSTEM_PROMPT] ' + message['content'] + '[/SYSTEM_PROMPT]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + message['content'] + eos_token }}{% else %}{{ raise_exception('Only user, system and assistant roles are supported!') }}{% endif %}{% endfor %}",

// mistralai/Mistral-7B-Instruct-v0.2 (NOTE: Old pre-v1 without a system prompt)
"[INST] You are a helpful assistant\nHello [/INST]Hi there</s>[INST] Who are you [/INST]   I am an assistant   </s>[INST] Another question [/INST]",
.
.
.
// mistralai/Mistral-7B-Instruct-v0.2 (mistralai 'v1' template with a system prompt)
" [INST] You are a helpful assistant\n\nHello [/INST] Hi there</s> [INST] Who are you [/INST]    I am an assistant   </s> [INST] Another question [/INST]",
// Mistral-Large-Instruct-2407 (mistralai 'v2'/'v3' template; modified to have system prompt at start)
"[INST] You are a helpful assistant\n\nHello[/INST] Hi there</s>[INST] Who are you[/INST] I am an assistant</s>[INST] Another question[/INST]",
// Mistral-Nemo-Instruct-2407 (mistralai 'v3-tekken' template; modified to have system prompt at start)
"[INST]You are a helpful assistant\n\nHello[/INST]Hi there</s>[INST]Who are you[/INST]   I am an assistant   </s>[INST]Another question[/INST]",
// mistralai/Mistral-Large-Instruct-2411 (mistralai 'v7' template)
"[SYSTEM_PROMPT] You are a helpful assistant[/SYSTEM_PROMPT][INST] Hello[/INST] Hi there</s>[INST] Who are you[/INST]    I am an assistant   </s>[INST] Another question[/INST]",

using:

from transformers import AutoTokenizer

VARIANTS_TO_TEST = [
    'mistralai/Mistral-7B-Instruct-v0.2',     # v1
    'mistralai/Mistral-Large-Instruct-2407',  # v3
    'mistralai/Mistral-Nemo-Instruct-2407',   # v3-tekken
    'mistralai/Mistral-Large-Instruct-2411',  # v7
]

HISTORY = [
    { 'role': 'system', 'content': 'You are a helpful assistant' },
    { 'role': 'user', 'content': 'Hello' },
    { 'role': 'assistant', 'content': 'Hi there' },
    { 'role': 'user', 'content': 'Who are you' },
    { 'role': 'assistant', 'content': '   I am an assistant   ' },
    { 'role': 'user', 'content': 'Another question' },
]

for variant in VARIANTS_TO_TEST:
    history = [m for m in HISTORY] # copy
    print("\n----- " + variant + " -----")
    tokenizer = AutoTokenizer.from_pretrained(variant)
    output = tokenizer.apply_chat_template(history, tokenize=False, add_generation_prompt=True)
    print(output)
    print("\n[Test String]\n// " + variant)
    print('"' + output.replace("\n", "\\n") + '",')

which produces:

----- mistralai/Mistral-7B-Instruct-v0.2 -----
<s> [INST] You are a helpful assistant

Hello [/INST] Hi there</s> [INST] Who are you [/INST]    I am an assistant   </s> [INST] Another question [/INST]

[Test String]
// mistralai/Mistral-7B-Instruct-v0.2
"<s> [INST] You are a helpful assistant\n\nHello [/INST] Hi there</s> [INST] Who are you [/INST]    I am an assistant   </s> [INST] Another question [/INST]",

----- mistralai/Mistral-Large-Instruct-2407 -----
<s>[INST] Hello[/INST] Hi there</s>[INST] Who are you[/INST] I am an assistant</s>[INST] You are a helpful assistant

Another question[/INST]

[Test String]
// mistralai/Mistral-Large-Instruct-2407
"<s>[INST] Hello[/INST] Hi there</s>[INST] Who are you[/INST] I am an assistant</s>[INST] You are a helpful assistant\n\nAnother question[/INST]",

----- mistralai/Mistral-Nemo-Instruct-2407 -----
<s>[INST]Hello[/INST]Hi there</s>[INST]Who are you[/INST]   I am an assistant   </s>[INST]You are a helpful assistant

Another question[/INST]

[Test String]
// mistralai/Mistral-Nemo-Instruct-2407
"<s>[INST]Hello[/INST]Hi there</s>[INST]Who are you[/INST]   I am an assistant   </s>[INST]You are a helpful assistant\n\nAnother question[/INST]",

----- mistralai/Mistral-Large-Instruct-2411 -----
<s>[SYSTEM_PROMPT] You are a helpful assistant[/SYSTEM_PROMPT][INST] Hello[/INST] Hi there</s>[INST] Who are you[/INST]    I am an assistant   </s>[INST] Another question[/INST]

[Test String]
// mistralai/Mistral-Large-Instruct-2411
"<s>[SYSTEM_PROMPT] You are a helpful assistant[/SYSTEM_PROMPT][INST] Hello[/INST] Hi there</s>[INST] Who are you[/INST]    I am an assistant   </s>[INST] Another question[/INST]",

The only thing I did to "cheat" the tests slightly, is move the system prompt to the start as I mentioned above; appending it to the last user message will have disastrous consequences for the KV-cache invalidation, and according to mistralai's own docs:

This is how mistral_common and the templates implement system prompts, but this can easily be customized. Feel free to use system prompts in different places, such as the second from the last or simply as the first user message, as before.

it is fine to move it back to the start.

So in general I have tried to keep the old functionality for llama2, codellama and older "mistral-esque" models , but now allow the auto-detection and/or manual specification of "mistral-v1", "mistral-v3", "mistral-v3-tekken" and "mistral-v7" templates.

Hopefully this should improve the results we are getting from the newer models.

jukofyork · 2024-11-29T14:09:12Z

If this does get merged, then including this table in the "Templates supported by llama_chat_apply_template" wiki page:

MistralAI Model	Chat Template Name
Mistral 7B Instruct v0.1	mistral-v1
Mistral 7B Instruct v0.2	mistral-v1
Mixtral 8x7B Instruct v0.1	mistral-v1
miqu-1-70b	mistral-v1
Mistral 7B Instruct v0.3	mistral-v3
Mixtral 8x22B Instruct v0.1	mistral-v3
Mixtral 8x22B Instruct v0.3	mistral-v3
Codestral 22B v0.1	mistral-v3
Codestral Mamba 7B v0.1	mistral-v3
Mathstral 7B v0.1	mistral-v3
Large 123B 2407	mistral-v3
Nemo 12B 2407	mistral-v3-tekken
Large 123B 2411	mistral-v7

(I've added miqu-1-70b and Large 123B 2411 rows)

along with a link to the "Demystifying Mistral's Instruct Tokenization & Chat Templates" document might be a good idea.

jukofyork · 2024-11-30T19:34:16Z

@slaren @ggerganov this is ready for review if possible :)

If you don't like me leaving the old "mistral" stuff in the "llama2" block then I can try to fix this (and possibly rename "mistral-v1"), but I fear that there are so many broken variants of this type of Jinga2 template floating around in different fine-tunes and GGUF files, that it's probably best to just leave it be and have the "v1", "v3", "v3-tekken" and "v7" auto-detect based on the current (hopefully final) mistralai created Jinga2 templates, and then let the user use "mistral-v1" if they know their particular fine-tune or GGUF really uses the "v1" style prompt.

jukofyork · 2024-11-30T19:44:14Z

Actually, looking over the code again, I've seen a bug that got past the tests:

std::string content = (trim_assistant_message? trim(message->content) : message->content);

According to the current templates this should only be applied to the assistant messages (likely to he sure the space gets re-prepended to the first token output by the assistant).

I'll try and fix using GitHub editor as away from home atm.

Fixed `trim_assistant_message` bug

…mplate types ggerganov#10572 LCPP PR by Jukyofork.

jukofyork · 2024-12-01T12:05:57Z

So I've been thinking about the use of llama_chat_apply_template_internal() in general:

The main brittleness is coming from the auto-detection via the stored Jinga2 template in the GGUF header, and looking at the repos on huggingface then it's very common that:

The Jinga2 templates are getting regularly corrected and fiddled about with, but the GGUFs never get updated unless there is a big problem identified with the tokeniser, etc.
Fine-tuners often just copy the official tokenizer_config.json into their repo without consideration if they actually used the official Jinga2 template and/or many of the special tokens.

Is there any way we could pass/access some structure that contains the list of tokens for a model to/in llama_chat_apply_template_internal()?

If it were possible, then we could use it to write much more robust heuristics than the current code allows for:

There are probably many small "degradations" due to spaces in the other templates causing out-of-distribution tokenisation, but it's often not all that noticeable unless you need to use the model for coding or with long-contexts.
Many of the problems caused by fine-tuners and GGUF quantisers including the wrong Jinga2 template could be sidestepped.

I'd be happy to take over the job of making sure all the templates are correct and maintaining the llama_chat_apply_template_internal() code if there is some way to make them more robust like this.

Also, would it be worthwhile investigating the possibility of replacing all the hard-coded substitution logic inside llama_chat_apply_template_internal() with a very minimal string template substitution parser?

I think a stack-based version would be very easy to implement (ie: using postfix notation for any conditionals, string variables for role and content, and a handful of boolean variables for is_first and is_last, etc), but I'm not sure people without a CS background would understand it enough to write their own like this (eg: Reverse Polish notation likely seems very confusing to most people now).

A non-stack-based solution would be quite a lot harder though (I've just looked through some "minimal" templating libraries written in C/C++ and the AST parsing is quite a significant increase in complexity).

slaren · 2024-12-01T17:40:56Z

It may be worth taking a look at the mini jinja engine that @ochafik is working on in #9639. If that works well, it would allow us to remove all the pre-defined templates.

jukofyork · 2024-12-01T19:39:01Z

It may be worth taking a look at the mini jinja engine that @ochafik is working on in #9639. If that works well, it would allow us to remove all the pre-defined templates.

Oh, I missed that and agree that looks the best solution long term.

Not sure what the best plan to do with this is then:

The bool support_system_message = tmpl_contains("<<SYS>>") || tmpl == "mistral" bug probably wants fixing whatever.
I could try to rework it to not add the new mistral-v names and just try to autodetect only, but not sure it will catch many of the Jinga2 templates in the GGUFs in the wild.

ngxson

Thanks for fixing this. Tbh, mistral template is quite messy and I almost gave up on that for now.

Btw we should find a way to reduce the number of string compare == and finding needle-in-haystack tmpl_contains. This is not very important, so I'll make a PR for that later.

…chat template types (ggerganov#10572) * Templates: `mistral-v1`, `mistral-v2`, `mistral-v3`, `mistral-v3-tekken` * Changed system message logic and added tests for all 4 * Invalid `system_message` instead of `content` fixed * Removed tab-indented lines * Added template code and test for `mistral-v7` * Added all tests. Fixed bug with `tmpl == "llama2"` test. * Replaced tabs with spaces. * Removed `'mistral-v2'` option as no (open) models ever used it * Removed all references to 'v2' template from comments * Update llama.cpp Fixed `trim_assistant_message` bug

jukofyork added 2 commits November 29, 2024 00:05

Templates: mistral-v1, mistral-v2, mistral-v3, mistral-v3-tekken

dbbde92

Changed system message logic and added tests for all 4

f55c434

github-actions bot added the testing Everything test related label Nov 29, 2024

Invalid system_message instead of content fixed

94944cc

jukofyork added 3 commits November 29, 2024 01:29

Removed tab-indented lines

fded997

Added template code and test for mistral-v7

29d7b45

Added all tests. Fixed bug with tmpl == "llama2" test.

1016cc6

Replaced tabs with spaces.

534e96c

jukofyork changed the title ~~Add mistral-v1, mistral-v2, mistral-v3 and mistral-v3-tekken chat template types~~ Add mistral-v1, mistral-v2, mistral-v3, mistral-v3-tekken and mistral-v7 chat template types Nov 29, 2024

Removed 'mistral-v2' option as no (open) models ever used it

0427e65

jukofyork changed the title ~~Add mistral-v1, mistral-v2, mistral-v3, mistral-v3-tekken and mistral-v7 chat template types~~ Add mistral-v1, mistral-v3, mistral-v3-tekken and mistral-v7 chat template types Nov 29, 2024

Removed all references to 'v2' template from comments

1f7a0c1

Update llama.cpp

aa5453d

Fixed `trim_assistant_message` bug

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Dec 1, 2024

Add mistral-v1, mistral-v3, mistral-v3-tekken and mistral-v7 chat te…

3a0fdab

…mplate types ggerganov#10572 LCPP PR by Jukyofork.

slaren requested a review from ngxson December 1, 2024 17:46

ngxson approved these changes Dec 1, 2024

View reviewed changes

ngxson merged commit 917786f into ggerganov:master Dec 1, 2024
50 checks passed

jukofyork deleted the fix-mistralai-chat-templates branch December 1, 2024 22:39

ochafik added a commit to google/minja that referenced this pull request Dec 5, 2024

Add more gated models that work (cf. ggerganov/llama.cpp#10572 )

bd98762

ThiloteE mentioned this pull request Dec 13, 2024

[Jinja] GPT4All-Chat 3.5.1 breaks bartowski/Mistral-Nemo-Instruct-2407-GGUF nomic-ai/gpt4all#3284

Closed

phil-scott-78 mentioned this pull request Jan 2, 2025

Adds an exception for missing templates. SciSharp/LLamaSharp#1034

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `mistral-v1`, `mistral-v3`, `mistral-v3-tekken` and `mistral-v7` chat template types #10572

Add `mistral-v1`, `mistral-v3`, `mistral-v3-tekken` and `mistral-v7` chat template types #10572

jukofyork commented Nov 29, 2024 •

edited

Loading

jukofyork commented Nov 29, 2024

Basic Instruct Template (V7)

jukofyork commented Nov 29, 2024 •

edited

Loading

jukofyork commented Nov 29, 2024 •

edited

Loading

jukofyork commented Nov 30, 2024

jukofyork commented Nov 30, 2024

jukofyork commented Dec 1, 2024

slaren commented Dec 1, 2024

jukofyork commented Dec 1, 2024

ngxson left a comment

Add mistral-v1, mistral-v3, mistral-v3-tekken and mistral-v7 chat template types #10572

Add mistral-v1, mistral-v3, mistral-v3-tekken and mistral-v7 chat template types #10572

Conversation

jukofyork commented Nov 29, 2024 • edited Loading

1. I have deliberately ignored the "prepend it to the last user message" part for the v2/v3 system prompts:

2. I can't really add any specific templates/expected_output tests:

3. If accepted then somebody with wiki access might want to update this page:

jukofyork commented Nov 29, 2024

Basic Instruct Template (V7)

jukofyork commented Nov 29, 2024 • edited Loading

1. Refactored if (tmpl == "llama2" || tmpl.find("mistral") == 0 || tmpl_contains("[INST]")) case:

2. Fixed a bug in the old fall-through case:

3. Added tests to test-chat-template.cpp:

jukofyork commented Nov 29, 2024 • edited Loading

jukofyork commented Nov 30, 2024

jukofyork commented Nov 30, 2024

jukofyork commented Dec 1, 2024

slaren commented Dec 1, 2024

jukofyork commented Dec 1, 2024

ngxson left a comment

Choose a reason for hiding this comment

Add `mistral-v1`, `mistral-v3`, `mistral-v3-tekken` and `mistral-v7` chat template types #10572

Add `mistral-v1`, `mistral-v3`, `mistral-v3-tekken` and `mistral-v7` chat template types #10572

jukofyork commented Nov 29, 2024 •

edited

Loading

2. I can't really add any specific `templates`/`expected_output` tests:

jukofyork commented Nov 29, 2024 •

edited

Loading

1. Refactored `if (tmpl == "llama2" || tmpl.find("mistral") == 0 || tmpl_contains("[INST]"))` case:

3. Added tests to `test-chat-template.cpp`:

jukofyork commented Nov 29, 2024 •

edited

Loading