Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mistral-v1, mistral-v3, mistral-v3-tekken and mistral-v7 chat template types #10572

Merged
merged 10 commits into from
Dec 1, 2024

Conversation

jukofyork
Copy link
Contributor

@jukofyork jukofyork commented Nov 29, 2024


This PR adds the mistral-v1, mistral-v2, mistral-v3 and mistral-v3-tekken chat template types based on the "Demystifying Mistral's Instruct Tokenization & Chat Templates" documentation provided by MistralAI themselves:

https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md
https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/templates.md

1. I have deliberately ignored the "prepend it to the last user message" part for the v2/v3 system prompts:

<s>[INST] user message[/INST] assistant message</s>[INST] system prompt

new user message[/INST]

as this will constantly invalidate the KV-cache and from their own wording; I'm not convinced it has even been trained like this nor expects it there:

This is how mistral_common and the templates implement system prompts, but this can easily be customized. Feel free to use system prompts in different places, such as the second from the last or simply as the first user message, as before.

2. I can't really add any specific templates/expected_output tests:

I've left the old logic intact as a fall-through:

    } else if (tmpl == "llama2" || tmpl == "mistral" || tmpl_contains("[INST]")) {
        // llama2 template and its variants
        // [variant] support system message
        bool support_system_message = tmpl_contains("<<SYS>>") || tmpl == "mistral";
        // [variant] space before + after response
        bool space_around_response = tmpl_contains("' ' + eos_token");
        // [variant] add BOS inside history
        bool add_bos_inside_history = tmpl_contains("bos_token + '[INST]");
        // [variant] trim spaces from the input message
        bool strip_message = tmpl_contains("content.strip()");
        // construct the prompt
        bool is_inside_turn = true; // skip BOS at the beginning
        ss << "[INST] ";
        for (auto message : chat) {
            std::string content = strip_message ? trim(message->content) : message->content;
            std::string role(message->role);
            if (!is_inside_turn) {
                is_inside_turn = true;
                ss << (add_bos_inside_history ? "<s>[INST] " : "[INST] ");
            }
            if (role == "system") {
                if (support_system_message) {
                    ss << "<<SYS>>\n" << content << "\n<</SYS>>\n\n";
                } else {
                    // if the model does not support system message, we still include it in the first message, but without <<SYS>>
                    ss << content << "\n";
                }
            } else if (role == "user") {
                ss << content << " [/INST]";
            } else {
                ss << (space_around_response ? " " : "") << content << (space_around_response ? " " : "") << "</s>";
                is_inside_turn = false;
            }
        }
        // llama2 templates seem to not care about "add_generation_prompt"

as this looks way too brittle to even try to touch and could end up affecting lots of existing models that use slight variations of the [INST] tags.

I did see one bug in the old code though:

bool support_system_message = tmpl_contains("<<SYS>>") || tmpl == "mistral";
.
.
.
if (support_system_message) {
    ss << "<<SYS>>\n" << content << "\n<</SYS>>\n\n";
}

No mistral model has every used <<SYS>> tags AFAIK (there are purely for llama2 and codellama only - see here).

There is also a lot of confusion with MistralAI's own repos having wrong templates posted by themselves at one time or another, mistral-community keeping up the old/wrong templates, and so on... So there isn't really much point in trying to do the "2. Use the following python script to generate a test conversation" part, as this is an attempt to finally "Demystify" the prompts from their own documentation, rather than go by whatever broken Jinja2 templates are floating around...

So instead I just added the mistral-v1, mistral-v2, mistral-v3 and mistral-v3-tekken options for the user to specify manually:

Open Model Tokenizer
Mistral 7B Instruct v0.1 v1
Mistral 7B Instruct v0.2 v1
Mistral 7B Instruct v0.3 v3
Mixtral 8x7B Instruct v0.1 v1
Mixtral 8x22B Instruct v0.1 v3
Mixtral 8x22B Instruct v0.3 v3
Codestral 22B v0.1 v3
Codestral Mamba 7B v0.1 v3
Mathstral 7B v0.1 v3
Nemo 12B 2407 v3 - Tekken
Large 123B 2407 v3
Endpoint Model Tokenizer
mistral-embed v1
open-mistral-7b v3
open-mixtral-8x7b v1
open-mixtral-8x22b v3
mistral-small-latest v2
mistral-large-latest v3
codestral-22b v3
open-codestral-mamba v3
open-mistral-nemo v3 - Tekken

but if unspecified; let the old logic of the fall-through code get used instead.

I have added to the fmt_sys and fmt_single tests at the bottom of test-chat-template.cpp so will check back tomorrow to see if they passed or not.

3. If accepted then somebody with wiki access might want to update this page:

https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template

as it doesn't seem possible to edit via a PR?

@github-actions github-actions bot added the testing Everything test related label Nov 29, 2024
@jukofyork
Copy link
Contributor Author

LOL, they've added a new (undocumented) v7 system-prompt now for the new mistralai/Mistral-Large-Instruct-2411:

Basic Instruct Template (V7)

<s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT][INST] <user message>[/INST] <assistant response>> </s>[INST] <user message>[/INST]

Be careful with subtle missing or trailing white spaces!

I'll try and add this tomorrow too if I get chance (what happened to v4, v5 and v6??? 😁).

@jukofyork
Copy link
Contributor Author

jukofyork commented Nov 29, 2024

So I've now:

1. Refactored if (tmpl == "llama2" || tmpl.find("mistral") == 0 || tmpl_contains("[INST]")) case:

  • Uses tmpl_contains("[SYSTEM_PROMPT]") to auto-detect the v7 template used by the new mistralai/Mistral-Large-Instruct-2411 model.
  • Uses tmpl_contains("' [INST] ' + system_message") to auto-detect the official v1 template models.
  • Uses tmpl_contains("[AVAILABLE_TOOLS]") to auto-detect the official v3 and v3-tekken template models.

2. Fixed a bug in the old fall-through case:

bool support_system_message = tmpl_contains("<<SYS>>") || tmpl == "mistral";

which should have been:

bool support_system_message = tmpl_contains("<<SYS>>") || tmpl == "llama2";

as before it has the exact opposite logic (ie: llama2 models got no <<SYS>>" tags and mistral models did; see here).

I've otherwise left this code untouched for fear of breaking some old "mistral-esque" models people might still use...

I have added tests at the bottom of test-chat-template.cpp for these cases now though:

assert(fmt_sys("llama2") == "[INST] <<SYS>>\nYou are a helpful assistant\n<</SYS>>\n\n");
assert(fmt_sys("mistral") == "[INST] You are a helpful assistant\n"); // for old pre-v1 templates
.
.
.
assert(fmt_single("llama2") == "[INST] How are you [/INST]");
assert(fmt_single("mistral") == "[INST] How are you [/INST]"); // for old pre-v1 templates

3. Added tests to test-chat-template.cpp:

// mistralai/Mistral-7B-Instruct-v0.2 (NOTE: Old pre-v1 without a system prompt)
"{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
.
.
.
// mistralai/Mistral-7B-Instruct-v0.2 (mistralai 'v1' template with a system prompt)
"{%- if messages[0]['role'] == 'system' %}\n    {%- set system_message = messages[0]['content'] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n\n{{- bos_token }}\n{%- for message in loop_messages %}\n    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}\n        {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}\n    {%- endif %}\n    {%- if message['role'] == 'user' %}\n        {%- if loop.first and system_message is defined %}\n            {{- ' [INST] ' + system_message + '\\n\\n' + message['content'] + ' [/INST]' }}\n        {%- else %}\n            {{- ' [INST] ' + message['content'] + ' [/INST]' }}\n        {%- endif %}\n    {%- elif message['role'] == 'assistant' %}\n        {{- ' ' + message['content'] + eos_token}}\n    {%- else %}\n        {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') }}\n    {%- endif %}\n{%- endfor %}\n",
// Mistral-Large-Instruct-2407 (mistralai 'v2'/'v3' template)
"{%- if messages[0][\"role\"] == \"system\" %}\n    {%- set system_message = messages[0][\"content\"] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n{%- if not tools is defined %}\n    {%- set tools = none %}\n{%- endif %}\n{%- set user_messages = loop_messages | selectattr(\"role\", \"equalto\", \"user\") | list %}\n\n{#- This block checks for alternating user/assistant messages, skipping tool calling messages #}\n{%- set ns = namespace() %}\n{%- set ns.index = 0 %}\n{%- for message in loop_messages %}\n    {%- if not (message.role == \"tool\" or message.role == \"tool_results\" or (message.tool_calls is defined and message.tool_calls is not none)) %}\n        {%- if (message[\"role\"] == \"user\") != (ns.index % 2 == 0) %}\n            {{- raise_exception(\"After the optional system message, conversation roles must alternate user/assistant/user/assistant/...\") }}\n        {%- endif %}\n        {%- set ns.index = ns.index + 1 %}\n    {%- endif %}\n{%- endfor %}\n\n{{- bos_token }}\n{%- for message in loop_messages %}\n    {%- if message[\"role\"] == \"user\" %}\n        {%- if tools is not none and (message == user_messages[-1]) %}\n            {{- \"[AVAILABLE_TOOLS] [\" }}\n            {%- for tool in tools %}\n                {%- set tool = tool.function %}\n                {{- '{\"type\": \"function\", \"function\": {' }}\n                {%- for key, val in tool.items() if key != \"return\" %}\n                    {%- if val is string %}\n                        {{- '\"' + key + '\": \"' + val + '\"' }}\n                    {%- else %}\n                        {{- '\"' + key + '\": ' + val|tojson }}\n                    {%- endif %}\n                    {%- if not loop.last %}\n                        {{- \", \" }}\n                    {%- endif %}\n                {%- endfor %}\n                {{- \"}}\" }}\n                {%- if not loop.last %}\n                    {{- \", \" }}\n                {%- else %}\n                    {{- \"]\" }}\n                {%- endif %}\n            {%- endfor %}\n            {{- \"[/AVAILABLE_TOOLS]\" }}\n            {%- endif %}\n        {%- if loop.last and system_message is defined %}\n            {{- \"[INST] \" + system_message + \"\\n\\n\" + message[\"content\"] + \"[/INST]\" }}\n        {%- else %}\n            {{- \"[INST] \" + message[\"content\"] + \"[/INST]\" }}\n        {%- endif %}\n    {%- elif message.tool_calls is defined and message.tool_calls is not none %}\n        {{- \"[TOOL_CALLS] [\" }}\n        {%- for tool_call in message.tool_calls %}\n            {%- set out = tool_call.function|tojson %}\n            {{- out[:-1] }}\n            {%- if not tool_call.id is defined or tool_call.id|length != 9 %}\n                {{- raise_exception(\"Tool call IDs should be alphanumeric strings with length 9!\") }}\n            {%- endif %}\n            {{- ', \"id\": \"' + tool_call.id + '\"}' }}\n            {%- if not loop.last %}\n                {{- \", \" }}\n            {%- else %}\n                {{- \"]\" + eos_token }}\n            {%- endif %}\n        {%- endfor %}\n    {%- elif message[\"role\"] == \"assistant\" %}\n        {{- \" \" + message[\"content\"]|trim + eos_token}}\n    {%- elif message[\"role\"] == \"tool_results\" or message[\"role\"] == \"tool\" %}\n        {%- if message.content is defined and message.content.content is defined %}\n            {%- set content = message.content.content %}\n        {%- else %}\n            {%- set content = message.content %}\n        {%- endif %}\n        {{- '[TOOL_RESULTS] {\"content\": ' + content|string + \", \" }}\n        {%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}\n            {{- raise_exception(\"Tool call IDs should be alphanumeric strings with length 9!\") }}\n        {%- endif %}\n        {{- '\"call_id\": \"' + message.tool_call_id + '\"}[/TOOL_RESULTS]' }}\n    {%- else %}\n        {{- raise_exception(\"Only user and assistant roles are supported, with the exception of an initial optional system message!\") }}\n    {%- endif %}\n{%- endfor %}\n",
// Mistral-Nemo-Instruct-2407 (mistralai 'v3-tekken' template)
"{%- if messages[0][\"role\"] == \"system\" %}\n    {%- set system_message = messages[0][\"content\"] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n{%- if not tools is defined %}\n    {%- set tools = none %}\n{%- endif %}\n{%- set user_messages = loop_messages | selectattr(\"role\", \"equalto\", \"user\") | list %}\n\n{#- This block checks for alternating user/assistant messages, skipping tool calling messages #}\n{%- set ns = namespace() %}\n{%- set ns.index = 0 %}\n{%- for message in loop_messages %}\n    {%- if not (message.role == \"tool\" or message.role == \"tool_results\" or (message.tool_calls is defined and message.tool_calls is not none)) %}\n        {%- if (message[\"role\"] == \"user\") != (ns.index % 2 == 0) %}\n            {{- raise_exception(\"After the optional system message, conversation roles must alternate user/assistant/user/assistant/...\") }}\n        {%- endif %}\n        {%- set ns.index = ns.index + 1 %}\n    {%- endif %}\n{%- endfor %}\n\n{{- bos_token }}\n{%- for message in loop_messages %}\n    {%- if message[\"role\"] == \"user\" %}\n        {%- if tools is not none and (message == user_messages[-1]) %}\n            {{- \"[AVAILABLE_TOOLS][\" }}\n            {%- for tool in tools %}\n                {%- set tool = tool.function %}\n                {{- '{\"type\": \"function\", \"function\": {' }}\n                {%- for key, val in tool.items() if key != \"return\" %}\n                    {%- if val is string %}\n                        {{- '\"' + key + '\": \"' + val + '\"' }}\n                    {%- else %}\n                        {{- '\"' + key + '\": ' + val|tojson }}\n                    {%- endif %}\n                    {%- if not loop.last %}\n                        {{- \", \" }}\n                    {%- endif %}\n                {%- endfor %}\n                {{- \"}}\" }}\n                {%- if not loop.last %}\n                    {{- \", \" }}\n                {%- else %}\n                    {{- \"]\" }}\n                {%- endif %}\n            {%- endfor %}\n            {{- \"[/AVAILABLE_TOOLS]\" }}\n            {%- endif %}\n        {%- if loop.last and system_message is defined %}\n            {{- \"[INST]\" + system_message + \"\\n\\n\" + message[\"content\"] + \"[/INST]\" }}\n        {%- else %}\n            {{- \"[INST]\" + message[\"content\"] + \"[/INST]\" }}\n        {%- endif %}\n    {%- elif (message.tool_calls is defined and message.tool_calls is not none) %}\n        {{- \"[TOOL_CALLS][\" }}\n        {%- for tool_call in message.tool_calls %}\n            {%- set out = tool_call.function|tojson %}\n            {{- out[:-1] }}\n            {%- if not tool_call.id is defined or tool_call.id|length != 9 %}\n                {{- raise_exception(\"Tool call IDs should be alphanumeric strings with length 9!\") }}\n            {%- endif %}\n            {{- ', \"id\": \"' + tool_call.id + '\"}' }}\n            {%- if not loop.last %}\n                {{- \", \" }}\n            {%- else %}\n                {{- \"]\" + eos_token }}\n            {%- endif %}\n        {%- endfor %}\n    {%- elif message[\"role\"] == \"assistant\" %}\n        {{- message[\"content\"] + eos_token}}\n    {%- elif message[\"role\"] == \"tool_results\" or message[\"role\"] == \"tool\" %}\n        {%- if message.content is defined and message.content.content is defined %}\n            {%- set content = message.content.content %}\n        {%- else %}\n            {%- set content = message.content %}\n        {%- endif %}\n        {{- '[TOOL_RESULTS]{\"content\": ' + content|string + \", \" }}\n        {%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}\n            {{- raise_exception(\"Tool call IDs should be alphanumeric strings with length 9!\") }}\n        {%- endif %}\n        {{- '\"call_id\": \"' + message.tool_call_id + '\"}[/TOOL_RESULTS]' }}\n    {%- else %}\n        {{- raise_exception(\"Only user and assistant roles are supported, with the exception of an initial optional system message!\") }}\n    {%- endif %}\n{%- endfor %}\n",
// mistralai/Mistral-Large-Instruct-2411 (mistralai 'v7' template)
"{{ bos_token }}{% for message in messages %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + '[/INST]' }}{% elif message['role'] == 'system' %}{{ '[SYSTEM_PROMPT] ' + message['content'] + '[/SYSTEM_PROMPT]' }}{% elif message['role'] == 'assistant' %}{{ ' ' + message['content'] + eos_token }}{% else %}{{ raise_exception('Only user, system and assistant roles are supported!') }}{% endif %}{% endfor %}",
// mistralai/Mistral-7B-Instruct-v0.2 (NOTE: Old pre-v1 without a system prompt)
"[INST] You are a helpful assistant\nHello [/INST]Hi there</s>[INST] Who are you [/INST]   I am an assistant   </s>[INST] Another question [/INST]",
.
.
.
// mistralai/Mistral-7B-Instruct-v0.2 (mistralai 'v1' template with a system prompt)
" [INST] You are a helpful assistant\n\nHello [/INST] Hi there</s> [INST] Who are you [/INST]    I am an assistant   </s> [INST] Another question [/INST]",
// Mistral-Large-Instruct-2407 (mistralai 'v2'/'v3' template; modified to have system prompt at start)
"[INST] You are a helpful assistant\n\nHello[/INST] Hi there</s>[INST] Who are you[/INST] I am an assistant</s>[INST] Another question[/INST]",
// Mistral-Nemo-Instruct-2407 (mistralai 'v3-tekken' template; modified to have system prompt at start)
"[INST]You are a helpful assistant\n\nHello[/INST]Hi there</s>[INST]Who are you[/INST]   I am an assistant   </s>[INST]Another question[/INST]",
// mistralai/Mistral-Large-Instruct-2411 (mistralai 'v7' template)
"[SYSTEM_PROMPT] You are a helpful assistant[/SYSTEM_PROMPT][INST] Hello[/INST] Hi there</s>[INST] Who are you[/INST]    I am an assistant   </s>[INST] Another question[/INST]",

using:

from transformers import AutoTokenizer

VARIANTS_TO_TEST = [
    'mistralai/Mistral-7B-Instruct-v0.2',     # v1
    'mistralai/Mistral-Large-Instruct-2407',  # v3
    'mistralai/Mistral-Nemo-Instruct-2407',   # v3-tekken
    'mistralai/Mistral-Large-Instruct-2411',  # v7
]

HISTORY = [
    { 'role': 'system', 'content': 'You are a helpful assistant' },
    { 'role': 'user', 'content': 'Hello' },
    { 'role': 'assistant', 'content': 'Hi there' },
    { 'role': 'user', 'content': 'Who are you' },
    { 'role': 'assistant', 'content': '   I am an assistant   ' },
    { 'role': 'user', 'content': 'Another question' },
]

for variant in VARIANTS_TO_TEST:
    history = [m for m in HISTORY] # copy
    print("\n----- " + variant + " -----")
    tokenizer = AutoTokenizer.from_pretrained(variant)
    output = tokenizer.apply_chat_template(history, tokenize=False, add_generation_prompt=True)
    print(output)
    print("\n[Test String]\n// " + variant)
    print('"' + output.replace("\n", "\\n") + '",')

which produces:

----- mistralai/Mistral-7B-Instruct-v0.2 -----
<s> [INST] You are a helpful assistant

Hello [/INST] Hi there</s> [INST] Who are you [/INST]    I am an assistant   </s> [INST] Another question [/INST]

[Test String]
// mistralai/Mistral-7B-Instruct-v0.2
"<s> [INST] You are a helpful assistant\n\nHello [/INST] Hi there</s> [INST] Who are you [/INST]    I am an assistant   </s> [INST] Another question [/INST]",

----- mistralai/Mistral-Large-Instruct-2407 -----
<s>[INST] Hello[/INST] Hi there</s>[INST] Who are you[/INST] I am an assistant</s>[INST] You are a helpful assistant

Another question[/INST]

[Test String]
// mistralai/Mistral-Large-Instruct-2407
"<s>[INST] Hello[/INST] Hi there</s>[INST] Who are you[/INST] I am an assistant</s>[INST] You are a helpful assistant\n\nAnother question[/INST]",

----- mistralai/Mistral-Nemo-Instruct-2407 -----
<s>[INST]Hello[/INST]Hi there</s>[INST]Who are you[/INST]   I am an assistant   </s>[INST]You are a helpful assistant

Another question[/INST]

[Test String]
// mistralai/Mistral-Nemo-Instruct-2407
"<s>[INST]Hello[/INST]Hi there</s>[INST]Who are you[/INST]   I am an assistant   </s>[INST]You are a helpful assistant\n\nAnother question[/INST]",

----- mistralai/Mistral-Large-Instruct-2411 -----
<s>[SYSTEM_PROMPT] You are a helpful assistant[/SYSTEM_PROMPT][INST] Hello[/INST] Hi there</s>[INST] Who are you[/INST]    I am an assistant   </s>[INST] Another question[/INST]

[Test String]
// mistralai/Mistral-Large-Instruct-2411
"<s>[SYSTEM_PROMPT] You are a helpful assistant[/SYSTEM_PROMPT][INST] Hello[/INST] Hi there</s>[INST] Who are you[/INST]    I am an assistant   </s>[INST] Another question[/INST]",

The only thing I did to "cheat" the tests slightly, is move the system prompt to the start as I mentioned above; appending it to the last user message will have disastrous consequences for the KV-cache invalidation, and according to mistralai's own docs:

This is how mistral_common and the templates implement system prompts, but this can easily be customized. Feel free to use system prompts in different places, such as the second from the last or simply as the first user message, as before.

it is fine to move it back to the start.


So in general I have tried to keep the old functionality for llama2, codellama and older "mistral-esque" models , but now allow the auto-detection and/or manual specification of "mistral-v1", "mistral-v3", "mistral-v3-tekken" and "mistral-v7" templates.

Hopefully this should improve the results we are getting from the newer models.

@jukofyork jukofyork changed the title Add mistral-v1, mistral-v2, mistral-v3 and mistral-v3-tekken chat template types Add mistral-v1, mistral-v2, mistral-v3, mistral-v3-tekken and mistral-v7 chat template types Nov 29, 2024
@jukofyork
Copy link
Contributor Author

jukofyork commented Nov 29, 2024

If this does get merged, then including this table in the "Templates supported by llama_chat_apply_template" wiki page:

MistralAI Model Chat Template Name
Mistral 7B Instruct v0.1 mistral-v1
Mistral 7B Instruct v0.2 mistral-v1
Mixtral 8x7B Instruct v0.1 mistral-v1
miqu-1-70b mistral-v1
Mistral 7B Instruct v0.3 mistral-v3
Mixtral 8x22B Instruct v0.1 mistral-v3
Mixtral 8x22B Instruct v0.3 mistral-v3
Codestral 22B v0.1 mistral-v3
Codestral Mamba 7B v0.1 mistral-v3
Mathstral 7B v0.1 mistral-v3
Large 123B 2407 mistral-v3
Nemo 12B 2407 mistral-v3-tekken
Large 123B 2411 mistral-v7

(I've added miqu-1-70b and Large 123B 2411 rows)

along with a link to the "Demystifying Mistral's Instruct Tokenization & Chat Templates" document might be a good idea.

@jukofyork jukofyork changed the title Add mistral-v1, mistral-v2, mistral-v3, mistral-v3-tekken and mistral-v7 chat template types Add mistral-v1, mistral-v3, mistral-v3-tekken and mistral-v7 chat template types Nov 29, 2024
@jukofyork
Copy link
Contributor Author

@slaren @ggerganov this is ready for review if possible :)

If you don't like me leaving the old "mistral" stuff in the "llama2" block then I can try to fix this (and possibly rename "mistral-v1"), but I fear that there are so many broken variants of this type of Jinga2 template floating around in different fine-tunes and GGUF files, that it's probably best to just leave it be and have the "v1", "v3", "v3-tekken" and "v7" auto-detect based on the current (hopefully final) mistralai created Jinga2 templates, and then let the user use "mistral-v1" if they know their particular fine-tune or GGUF really uses the "v1" style prompt.

@jukofyork
Copy link
Contributor Author

Actually, looking over the code again, I've seen a bug that got past the tests:

std::string content = (trim_assistant_message? trim(message->content) : message->content);

According to the current templates this should only be applied to the assistant messages (likely to he sure the space gets re-prepended to the first token output by the assistant).

I'll try and fix using GitHub editor as away from home atm.

Fixed `trim_assistant_message` bug
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Dec 1, 2024
@jukofyork
Copy link
Contributor Author

So I've been thinking about the use of llama_chat_apply_template_internal() in general:

The main brittleness is coming from the auto-detection via the stored Jinga2 template in the GGUF header, and looking at the repos on huggingface then it's very common that:

  1. The Jinga2 templates are getting regularly corrected and fiddled about with, but the GGUFs never get updated unless there is a big problem identified with the tokeniser, etc.
  2. Fine-tuners often just copy the official tokenizer_config.json into their repo without consideration if they actually used the official Jinga2 template and/or many of the special tokens.

Is there any way we could pass/access some structure that contains the list of tokens for a model to/in llama_chat_apply_template_internal()?

If it were possible, then we could use it to write much more robust heuristics than the current code allows for:

  • There are probably many small "degradations" due to spaces in the other templates causing out-of-distribution tokenisation, but it's often not all that noticeable unless you need to use the model for coding or with long-contexts.
  • Many of the problems caused by fine-tuners and GGUF quantisers including the wrong Jinga2 template could be sidestepped.

I'd be happy to take over the job of making sure all the templates are correct and maintaining the llama_chat_apply_template_internal() code if there is some way to make them more robust like this.


Also, would it be worthwhile investigating the possibility of replacing all the hard-coded substitution logic inside llama_chat_apply_template_internal() with a very minimal string template substitution parser?

I think a stack-based version would be very easy to implement (ie: using postfix notation for any conditionals, string variables for role and content, and a handful of boolean variables for is_first and is_last, etc), but I'm not sure people without a CS background would understand it enough to write their own like this (eg: Reverse Polish notation likely seems very confusing to most people now).

A non-stack-based solution would be quite a lot harder though (I've just looked through some "minimal" templating libraries written in C/C++ and the AST parsing is quite a significant increase in complexity).

@slaren
Copy link
Collaborator

slaren commented Dec 1, 2024

It may be worth taking a look at the mini jinja engine that @ochafik is working on in #9639. If that works well, it would allow us to remove all the pre-defined templates.

@slaren slaren requested a review from ngxson December 1, 2024 17:46
@jukofyork
Copy link
Contributor Author

It may be worth taking a look at the mini jinja engine that @ochafik is working on in #9639. If that works well, it would allow us to remove all the pre-defined templates.

Oh, I missed that and agree that looks the best solution long term.

Not sure what the best plan to do with this is then:

  • The bool support_system_message = tmpl_contains("<<SYS>>") || tmpl == "mistral" bug probably wants fixing whatever.
  • I could try to rework it to not add the new mistral-v names and just try to autodetect only, but not sure it will catch many of the Jinga2 templates in the GGUFs in the wild.

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this. Tbh, mistral template is quite messy and I almost gave up on that for now.

Btw we should find a way to reduce the number of string compare == and finding needle-in-haystack tmpl_contains. This is not very important, so I'll make a PR for that later.

@ngxson ngxson merged commit 917786f into ggerganov:master Dec 1, 2024
50 checks passed
@jukofyork jukofyork deleted the fix-mistralai-chat-templates branch December 1, 2024 22:39
ochafik added a commit to google/minja that referenced this pull request Dec 5, 2024
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Dec 7, 2024
…chat template types (ggerganov#10572)

* Templates: `mistral-v1`, `mistral-v2`, `mistral-v3`, `mistral-v3-tekken`

* Changed system message logic and added tests for all 4

* Invalid `system_message` instead of `content` fixed

* Removed tab-indented lines

* Added template code and test for `mistral-v7`

* Added all tests. Fixed bug with `tmpl == "llama2"` test.

* Replaced tabs with spaces.

* Removed `'mistral-v2'` option as no (open) models ever used it

* Removed all references to 'v2' template from comments

* Update llama.cpp

Fixed `trim_assistant_message` bug
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024
…chat template types (ggerganov#10572)

* Templates: `mistral-v1`, `mistral-v2`, `mistral-v3`, `mistral-v3-tekken`

* Changed system message logic and added tests for all 4

* Invalid `system_message` instead of `content` fixed

* Removed tab-indented lines

* Added template code and test for `mistral-v7`

* Added all tests. Fixed bug with `tmpl == "llama2"` test.

* Replaced tabs with spaces.

* Removed `'mistral-v2'` option as no (open) models ever used it

* Removed all references to 'v2' template from comments

* Update llama.cpp

Fixed `trim_assistant_message` bug
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants