-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mistral-v1
, mistral-v3
, mistral-v3-tekken
and mistral-v7
chat template types
#10572
Add mistral-v1
, mistral-v3
, mistral-v3-tekken
and mistral-v7
chat template types
#10572
Conversation
LOL, they've added a new (undocumented)
I'll try and add this tomorrow too if I get chance (what happened to |
So I've now: 1. Refactored
|
mistral-v1
, mistral-v2
, mistral-v3
and mistral-v3-tekken
chat template typesmistral-v1
, mistral-v2
, mistral-v3
, mistral-v3-tekken
and mistral-v7
chat template types
If this does get merged, then including this table in the "Templates supported by llama_chat_apply_template" wiki page:
(I've added along with a link to the "Demystifying Mistral's Instruct Tokenization & Chat Templates" document might be a good idea. |
mistral-v1
, mistral-v2
, mistral-v3
, mistral-v3-tekken
and mistral-v7
chat template typesmistral-v1
, mistral-v3
, mistral-v3-tekken
and mistral-v7
chat template types
@slaren @ggerganov this is ready for review if possible :) If you don't like me leaving the old "mistral" stuff in the "llama2" block then I can try to fix this (and possibly rename "mistral-v1"), but I fear that there are so many broken variants of this type of Jinga2 template floating around in different fine-tunes and GGUF files, that it's probably best to just leave it be and have the "v1", "v3", "v3-tekken" and "v7" auto-detect based on the current (hopefully final) mistralai created Jinga2 templates, and then let the user use "mistral-v1" if they know their particular fine-tune or GGUF really uses the "v1" style prompt. |
Actually, looking over the code again, I've seen a bug that got past the tests: std::string content = (trim_assistant_message? trim(message->content) : message->content); According to the current templates this should only be applied to the assistant messages (likely to he sure the space gets re-prepended to the first token output by the assistant). I'll try and fix using GitHub editor as away from home atm. |
Fixed `trim_assistant_message` bug
…mplate types ggerganov#10572 LCPP PR by Jukyofork.
So I've been thinking about the use of The main brittleness is coming from the auto-detection via the stored Jinga2 template in the GGUF header, and looking at the repos on huggingface then it's very common that:
Is there any way we could pass/access some structure that contains the list of tokens for a model to/in If it were possible, then we could use it to write much more robust heuristics than the current code allows for:
I'd be happy to take over the job of making sure all the templates are correct and maintaining the Also, would it be worthwhile investigating the possibility of replacing all the hard-coded substitution logic inside I think a stack-based version would be very easy to implement (ie: using postfix notation for any conditionals, string variables for A non-stack-based solution would be quite a lot harder though (I've just looked through some "minimal" templating libraries written in C/C++ and the AST parsing is quite a significant increase in complexity). |
Oh, I missed that and agree that looks the best solution long term. Not sure what the best plan to do with this is then:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this. Tbh, mistral template is quite messy and I almost gave up on that for now.
Btw we should find a way to reduce the number of string compare ==
and finding needle-in-haystack tmpl_contains
. This is not very important, so I'll make a PR for that later.
…chat template types (ggerganov#10572) * Templates: `mistral-v1`, `mistral-v2`, `mistral-v3`, `mistral-v3-tekken` * Changed system message logic and added tests for all 4 * Invalid `system_message` instead of `content` fixed * Removed tab-indented lines * Added template code and test for `mistral-v7` * Added all tests. Fixed bug with `tmpl == "llama2"` test. * Replaced tabs with spaces. * Removed `'mistral-v2'` option as no (open) models ever used it * Removed all references to 'v2' template from comments * Update llama.cpp Fixed `trim_assistant_message` bug
…chat template types (ggerganov#10572) * Templates: `mistral-v1`, `mistral-v2`, `mistral-v3`, `mistral-v3-tekken` * Changed system message logic and added tests for all 4 * Invalid `system_message` instead of `content` fixed * Removed tab-indented lines * Added template code and test for `mistral-v7` * Added all tests. Fixed bug with `tmpl == "llama2"` test. * Replaced tabs with spaces. * Removed `'mistral-v2'` option as no (open) models ever used it * Removed all references to 'v2' template from comments * Update llama.cpp Fixed `trim_assistant_message` bug
This PR adds the
mistral-v1
,mistral-v2
,mistral-v3
andmistral-v3-tekken
chat template types based on the "Demystifying Mistral's Instruct Tokenization & Chat Templates" documentation provided by MistralAI themselves:https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md
https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/templates.md
1. I have deliberately ignored the "prepend it to the last user message" part for the v2/v3 system prompts:
as this will constantly invalidate the KV-cache and from their own wording; I'm not convinced it has even been trained like this nor expects it there:
2. I can't really add any specific
templates
/expected_output
tests:I've left the old logic intact as a fall-through:
as this looks way too brittle to even try to touch and could end up affecting lots of existing models that use slight variations of the
[INST]
tags.I did see one bug in the old code though:
No mistral model has every used
<<SYS>>
tags AFAIK (there are purely forllama2
andcodellama
only - see here).There is also a lot of confusion with MistralAI's own repos having wrong templates posted by themselves at one time or another,
mistral-community
keeping up the old/wrong templates, and so on... So there isn't really much point in trying to do the "2. Use the following python script to generate a test conversation" part, as this is an attempt to finally "Demystify" the prompts from their own documentation, rather than go by whatever broken Jinja2 templates are floating around...So instead I just added the
mistral-v1
,mistral-v2
,mistral-v3
andmistral-v3-tekken
options for the user to specify manually:but if unspecified; let the old logic of the fall-through code get used instead.
I have added to the
fmt_sys
andfmt_single
tests at the bottom oftest-chat-template.cpp
so will check back tomorrow to see if they passed or not.3. If accepted then somebody with wiki access might want to update this page:
https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
as it doesn't seem possible to edit via a PR?