Configurable Chat Formats #711

abetlen · 2023-09-14T07:53:17Z

Adds the ability to specify different chat formats. Closes #492

Support messages -> prompt formatting
Handle system messages
Handle custom stop sequences
~~Handle grammars for function calling formats (functionary)~~
Support common models (Llama2, Vicuna, Alpaca, etc)

API

import llama_cpp

# using pre-defined chat model names
model = Llama("vicuna.gguf", chat_format="vicuna")

# using a function that implements the LlamaChatFormat interface
model = Llama("vicuna.gguf", chat_format=llama_cpp.LlamaChatFormatVicuna())

References

earonesty

this is an excellent idea, and very necessary to use the python module

earonesty · 2023-09-28T20:53:27Z

llama_cpp/llama.py

@@ -1093,7 +1104,9 @@ def _create_completion(
                            # all remaining tokens cannot be decoded to a UTF-8 character
                            break
                        token_end_position += len(bs)
-                        if token_end_position > (remaining_length - first_stop_position):
+                        if token_end_position > (


it's probably a good idea to reduce the number of formatting changes in a PR

…a_cpp_python into configurable-chat-templates

teleprint-me · 2023-10-02T05:48:43Z

Problem Statement

When using the Llama library for conversational models, particularly with the llama-2 template, I encountered issues related to conversation formatting and context management. The issue manifested in two major ways:

Formatting Issue: System messages and assistant messages were not following the expected structure in the chat logs. This made it difficult to trace conversations and introduced errors.
Repeated Greetings: When using llama-cpp-python, the model would repeatedly greet me, even when the conversation had moved well beyond the initial greetings. This indicates a failure in contextual memory and leads to an undesirable user experience.

Desired Behavior

The assistant's output should be appropriately formatted, following the system messages, to create a coherent and human-readable interaction log.
The assistant should not repeat greetings or other statements that are contextually inappropriate.

Investigation and Solution

Helper Functions

def _get_system_message(messages: List[llama_types.ChatCompletionRequestMessage]) -> str:
    for message in messages:
        if message["role"] == "system":
            return message["content"] or ""
    return ""

def _map_roles(messages: List[llama_types.ChatCompletionRequestMessage], role_map: Dict[str, str]) -> List[Tuple[str, Optional[str]]]:
    output: List[Tuple[str, Optional[str]]] = []
    for message in messages:
        role = message["role"]
        if role in role_map:
            output.append((role_map[role], message["content"]))
    return output

Main Function for Formatting

@register_chat_format("llama-2")
def format_llama2(
    messages: List[llama_types.ChatCompletionRequestMessage],
    **kwargs: Any,
) -> ChatFormatterResponse:
    _system_template = "<<SYS>>\n{system_message}\n<</SYS>>\n\n"
    _roles = dict(user="[INST]", assistant="[/INST]")
    _sep = "\n\n"

    system_message = _get_system_message(messages)
    system_message = _system_template.format(system_message=system_message)

    _messages = _map_roles(messages, _roles)
    _messages.append((_roles["assistant"], None))

    _prompt = _format_llama2(system_message, _messages, _sep)
    return ChatFormatterResponse(prompt=_prompt)

Key Changes

Modified _system_template to correctly encapsulate system messages.
Implementation that avoids repeated greetings by maintaining conversation state, eliminating redundancies in the conversation.

Result

The assistant's output is now well-formatted and aligns correctly with the user's input.
Contextual inaccuracies like repeated greetings are avoided, providing a smoother conversation flow.

delock · 2023-10-06T14:37:10Z

I'm not sure whether this is the proper place to ask but I also encountered repeated greeting in a chat session. Is this issue already fixed by this merged PR? I updated to latest main branch but seems the problem is still there. Following is a conversation with llama 7b

Teacher:  Hello! *smiling* It's so great to meet you! Are you interested in learning English? I'm here to help and make it fun for you! Let's talk about something you like, okay?

You: I like playing with robot cars

Teacher:  Great! *excited* So, have you ever played with robot cars before? *smiling* They are so much fun! Can you tell me more about them? What do you like to do with them? *encouraging*

You: I like to program them to go around all kinds of things.

Teacher:  Hello there! *smiling* It's wonderful to meet you too! Are you interested in learning English? *excited* I'm here to help and make it fun for you! *nodding* Let's talk about something you like, okay? *encouraging* Oh, wow! You like playing with robot cars! *impressed* That's so cool! Can you tell me more about them? *listening* What do you like to do with them? *curious* Do you like making them race or play games with them? *excited* I like programming them too! *nodding* It's so much fun!

Problem Statement

When using the Llama library for conversational models, particularly with the llama-2 template, I encountered issues related to conversation formatting and context management. The issue manifested in two major ways:

Formatting Issue: System messages and assistant messages were not following the expected structure in the chat logs. This made it difficult to trace conversations and introduced errors.

Repeated Greetings: When using llama-cpp-python, the model would repeatedly greet me, even when the conversation had moved well beyond the initial greetings. This indicates a failure in contextual memory and leads to an undesirable user experience.

Desired Behavior

The assistant's output should be appropriately formatted, following the system messages, to create a coherent and human-readable interaction log.

The assistant should not repeat greetings or other statements that are contextually inappropriate.

Investigation and Solution

Helper Functions
def _get_system_message(messages: List[llama_types.ChatCompletionRequestMessage]) -> str:
    for message in messages:
        if message["role"] == "system":
            return message["content"] or ""
    return ""

def _map_roles(messages: List[llama_types.ChatCompletionRequestMessage], role_map: Dict[str, str]) -> List[Tuple[str, Optional[str]]]:
    output: List[Tuple[str, Optional[str]]] = []
    for message in messages:
        role = message["role"]
        if role in role_map:
            output.append((role_map[role], message["content"]))
    return output
Main Function for Formatting
@register_chat_format("llama-2")
def format_llama2(
    messages: List[llama_types.ChatCompletionRequestMessage],
    **kwargs: Any,
) -> ChatFormatterResponse:
    _system_template = "<<SYS>>\n{system_message}\n<</SYS>>\n\n"
    _roles = dict(user="[INST]", assistant="[/INST]")
    _sep = "\n\n"

    system_message = _get_system_message(messages)
    system_message = _system_template.format(system_message=system_message)

    _messages = _map_roles(messages, _roles)
    _messages.append((_roles["assistant"], None))

    _prompt = _format_llama2(system_message, _messages, _sep)
    return ChatFormatterResponse(prompt=_prompt)
Key Changes

Modified _system_template to correctly encapsulate system messages.

Implementation that avoids repeated greetings by maintaining conversation state, eliminating redundancies in the conversation.

Result

The assistant's output is now well-formatted and aligns correctly with the user's input.

Contextual inaccuracies like repeated greetings are avoided, providing a smoother conversation flow.

teleprint-me · 2023-10-06T15:44:25Z

@delock

I'm still waiting on a response. This issue is a bit more complicated than it appears.

It seems to be isolated to llama-cpp-python as I'm able to have a stable conversational flow using llama.cpp on its own while utilizing a solid template structure.

While removing the alleged special start token appears to fix the issue, delving a bit deeper into it reveals itself to be a bit more problematic.

The model will start attempting to "fill in the gaps" by predicting the special tokens as a result.

My intuition is that it's due to the compartmentalization of the setup and modification of the template structure which makes it tricky to isolate.

I think it should be noted that there also seems to be a performance issue with this approach as well, slowing it down due to repeatedly modifying the conversational structure.

delock · 2023-10-06T23:48:33Z

@teleprint-me Thanks for answer the issue. The issue is quite frustrating while I tried to build a local chatbot.

I see this is a reply to a closed PR. Can you open an issue for this strange behavior? Might help it better tracked.

@delock

I'm still waiting on a response. This issue is a bit more complicated than it appears.

It seems to be isolated to llama-cpp-python as I'm able to have a stable conversational flow using llama.cpp on its own while utilizing a solid template structure.

While removing the alleged special start token appears to fix the issue, delving a bit deeper into it reveals itself to be a bit more problematic.

The model will start attempting to "fill in the gaps" by predicting the special tokens as a result.

My intuition is that it's due to the compartmentalization of the setup and modification of the template structure which makes it tricky to isolate.

I think it should be noted that there also seems to be a performance issue with this approach as well, slowing it down due to repeatedly modifying the conversational structure.

abetlen added 3 commits September 14, 2023 03:50

Add configurable default chat completion format.

0552a74

Merge branch 'main' into configurable-chat-templates

48b4d12

Remove chat_template file to avoid circular import

179f45c

abetlen mentioned this pull request Sep 14, 2023

implement prompt template for chat completion #717

Open

abetlen added 6 commits September 15, 2023 00:51

Merge branch 'main' into configurable-chat-templates

e387afa

Merge branch 'main' into configurable-chat-templates

45a4188

Merge branch 'main' into configurable-chat-templates

6689d71

Update llama_types

428b64e

Merge branch 'main' into configurable-chat-templates

6ce9502

Merge branch 'main' into configurable-chat-templates

93487fd

earonesty reviewed Sep 28, 2023

View reviewed changes

abetlen added 4 commits September 29, 2023 00:19

Merge branch 'main' into configurable-chat-templates

9c68382

Add chat format

ae47e4f

Merge branch 'configurable-chat-templates' of github.com:abetlen/llam…

d545955

…a_cpp_python into configurable-chat-templates

Merge branch 'main' into configurable-chat-templates

3641913

abetlen marked this pull request as ready for review September 29, 2023 23:48

abetlen merged commit 3bca770 into main Sep 29, 2023

teleprint-me mentioned this pull request Oct 2, 2023

fix: Update Llama2 Formatting Template #781

Closed

earonesty mentioned this pull request Oct 7, 2023

0.2.9 broke the bos/eos/sys handling for chat sequences #800

Closed

delock mentioned this pull request Oct 8, 2023

Repeated greeting in same chat session #801

Closed

This was referenced Oct 30, 2023

Error with special tokens tokenization #838

Closed

fix: tokenization of special characters: #850

Merged

abetlen deleted the configurable-chat-templates branch January 31, 2024 20:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable Chat Formats #711

Configurable Chat Formats #711

abetlen commented Sep 14, 2023 •

edited

Loading

earonesty left a comment

earonesty Sep 28, 2023

teleprint-me commented Oct 2, 2023 •

edited

Loading

delock commented Oct 6, 2023 •

edited

Loading

Problem Statement

Desired Behavior

Investigation and Solution

Helper Functions

Main Function for Formatting

Key Changes

Result

teleprint-me commented Oct 6, 2023 •

edited

Loading

delock commented Oct 6, 2023

Configurable Chat Formats #711

Configurable Chat Formats #711

Conversation

abetlen commented Sep 14, 2023 • edited Loading

API

References

earonesty left a comment

Choose a reason for hiding this comment

earonesty Sep 28, 2023

Choose a reason for hiding this comment

teleprint-me commented Oct 2, 2023 • edited Loading

Problem Statement

Desired Behavior

Investigation and Solution

Helper Functions

Main Function for Formatting

Key Changes

Result

delock commented Oct 6, 2023 • edited Loading

Problem Statement

Desired Behavior

Investigation and Solution

Helper Functions

Main Function for Formatting

Key Changes

Result

teleprint-me commented Oct 6, 2023 • edited Loading

delock commented Oct 6, 2023

abetlen commented Sep 14, 2023 •

edited

Loading

teleprint-me commented Oct 2, 2023 •

edited

Loading

delock commented Oct 6, 2023 •

edited

Loading

teleprint-me commented Oct 6, 2023 •

edited

Loading