Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable Chat Formats #711

Merged
merged 13 commits into from
Sep 29, 2023
Merged

Configurable Chat Formats #711

merged 13 commits into from
Sep 29, 2023

Conversation

abetlen
Copy link
Owner

@abetlen abetlen commented Sep 14, 2023

Adds the ability to specify different chat formats. Closes #492

  • Support messages -> prompt formatting
  • Handle system messages
  • Handle custom stop sequences
  • Handle grammars for function calling formats (functionary)
  • Support common models (Llama2, Vicuna, Alpaca, etc)

API

import llama_cpp

# using pre-defined chat model names
model = Llama("vicuna.gguf", chat_format="vicuna")

# using a function that implements the LlamaChatFormat interface
model = Llama("vicuna.gguf", chat_format=llama_cpp.LlamaChatFormatVicuna())

References

Copy link
Contributor

@earonesty earonesty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an excellent idea, and very necessary to use the python module

@@ -1093,7 +1104,9 @@ def _create_completion(
# all remaining tokens cannot be decoded to a UTF-8 character
break
token_end_position += len(bs)
if token_end_position > (remaining_length - first_stop_position):
if token_end_position > (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably a good idea to reduce the number of formatting changes in a PR

@abetlen abetlen marked this pull request as ready for review September 29, 2023 23:48
@abetlen abetlen merged commit 3bca770 into main Sep 29, 2023
@teleprint-me
Copy link
Contributor

teleprint-me commented Oct 2, 2023

Problem Statement

When using the Llama library for conversational models, particularly with the llama-2 template, I encountered issues related to conversation formatting and context management. The issue manifested in two major ways:

  1. Formatting Issue: System messages and assistant messages were not following the expected structure in the chat logs. This made it difficult to trace conversations and introduced errors.

  2. Repeated Greetings: When using llama-cpp-python, the model would repeatedly greet me, even when the conversation had moved well beyond the initial greetings. This indicates a failure in contextual memory and leads to an undesirable user experience.

Desired Behavior

  1. The assistant's output should be appropriately formatted, following the system messages, to create a coherent and human-readable interaction log.
  2. The assistant should not repeat greetings or other statements that are contextually inappropriate.

Investigation and Solution

Helper Functions

def _get_system_message(messages: List[llama_types.ChatCompletionRequestMessage]) -> str:
    for message in messages:
        if message["role"] == "system":
            return message["content"] or ""
    return ""

def _map_roles(messages: List[llama_types.ChatCompletionRequestMessage], role_map: Dict[str, str]) -> List[Tuple[str, Optional[str]]]:
    output: List[Tuple[str, Optional[str]]] = []
    for message in messages:
        role = message["role"]
        if role in role_map:
            output.append((role_map[role], message["content"]))
    return output

Main Function for Formatting

@register_chat_format("llama-2")
def format_llama2(
    messages: List[llama_types.ChatCompletionRequestMessage],
    **kwargs: Any,
) -> ChatFormatterResponse:
    _system_template = "<<SYS>>\n{system_message}\n<</SYS>>\n\n"
    _roles = dict(user="[INST]", assistant="[/INST]")
    _sep = "\n\n"

    system_message = _get_system_message(messages)
    system_message = _system_template.format(system_message=system_message)

    _messages = _map_roles(messages, _roles)
    _messages.append((_roles["assistant"], None))

    _prompt = _format_llama2(system_message, _messages, _sep)
    return ChatFormatterResponse(prompt=_prompt)

Key Changes

  1. Modified _system_template to correctly encapsulate system messages.
  2. Implementation that avoids repeated greetings by maintaining conversation state, eliminating redundancies in the conversation.

Result

  1. The assistant's output is now well-formatted and aligns correctly with the user's input.
  2. Contextual inaccuracies like repeated greetings are avoided, providing a smoother conversation flow.

@delock
Copy link
Contributor

delock commented Oct 6, 2023

I'm not sure whether this is the proper place to ask but I also encountered repeated greeting in a chat session. Is this issue already fixed by this merged PR? I updated to latest main branch but seems the problem is still there. Following is a conversation with llama 7b

Teacher:  Hello! *smiling* It's so great to meet you! Are you interested in learning English? I'm here to help and make it fun for you! Let's talk about something you like, okay?

You: I like playing with robot cars

Teacher:  Great! *excited* So, have you ever played with robot cars before? *smiling* They are so much fun! Can you tell me more about them? What do you like to do with them? *encouraging*

You: I like to program them to go around all kinds of things.

Teacher:  Hello there! *smiling* It's wonderful to meet you too! Are you interested in learning English? *excited* I'm here to help and make it fun for you! *nodding* Let's talk about something you like, okay? *encouraging* Oh, wow! You like playing with robot cars! *impressed* That's so cool! Can you tell me more about them? *listening* What do you like to do with them? *curious* Do you like making them race or play games with them? *excited* I like programming them too! *nodding* It's so much fun!

Problem Statement

When using the Llama library for conversational models, particularly with the llama-2 template, I encountered issues related to conversation formatting and context management. The issue manifested in two major ways:

  1. Formatting Issue: System messages and assistant messages were not following the expected structure in the chat logs. This made it difficult to trace conversations and introduced errors.
  2. Repeated Greetings: When using llama-cpp-python, the model would repeatedly greet me, even when the conversation had moved well beyond the initial greetings. This indicates a failure in contextual memory and leads to an undesirable user experience.

Desired Behavior

  1. The assistant's output should be appropriately formatted, following the system messages, to create a coherent and human-readable interaction log.
  2. The assistant should not repeat greetings or other statements that are contextually inappropriate.

Investigation and Solution

Helper Functions

def _get_system_message(messages: List[llama_types.ChatCompletionRequestMessage]) -> str:
    for message in messages:
        if message["role"] == "system":
            return message["content"] or ""
    return ""

def _map_roles(messages: List[llama_types.ChatCompletionRequestMessage], role_map: Dict[str, str]) -> List[Tuple[str, Optional[str]]]:
    output: List[Tuple[str, Optional[str]]] = []
    for message in messages:
        role = message["role"]
        if role in role_map:
            output.append((role_map[role], message["content"]))
    return output

Main Function for Formatting

@register_chat_format("llama-2")
def format_llama2(
    messages: List[llama_types.ChatCompletionRequestMessage],
    **kwargs: Any,
) -> ChatFormatterResponse:
    _system_template = "<<SYS>>\n{system_message}\n<</SYS>>\n\n"
    _roles = dict(user="[INST]", assistant="[/INST]")
    _sep = "\n\n"

    system_message = _get_system_message(messages)
    system_message = _system_template.format(system_message=system_message)

    _messages = _map_roles(messages, _roles)
    _messages.append((_roles["assistant"], None))

    _prompt = _format_llama2(system_message, _messages, _sep)
    return ChatFormatterResponse(prompt=_prompt)

Key Changes

  1. Modified _system_template to correctly encapsulate system messages.
  2. Implementation that avoids repeated greetings by maintaining conversation state, eliminating redundancies in the conversation.

Result

  1. The assistant's output is now well-formatted and aligns correctly with the user's input.
  2. Contextual inaccuracies like repeated greetings are avoided, providing a smoother conversation flow.

@teleprint-me
Copy link
Contributor

teleprint-me commented Oct 6, 2023

@delock

I'm still waiting on a response. This issue is a bit more complicated than it appears.

It seems to be isolated to llama-cpp-python as I'm able to have a stable conversational flow using llama.cpp on its own while utilizing a solid template structure.

While removing the alleged special start token appears to fix the issue, delving a bit deeper into it reveals itself to be a bit more problematic.

The model will start attempting to "fill in the gaps" by predicting the special tokens as a result.

My intuition is that it's due to the compartmentalization of the setup and modification of the template structure which makes it tricky to isolate.

I think it should be noted that there also seems to be a performance issue with this approach as well, slowing it down due to repeatedly modifying the conversational structure.

@delock
Copy link
Contributor

delock commented Oct 6, 2023

@teleprint-me Thanks for answer the issue. The issue is quite frustrating while I tried to build a local chatbot.

I see this is a reply to a closed PR. Can you open an issue for this strange behavior? Might help it better tracked.

@delock

I'm still waiting on a response. This issue is a bit more complicated than it appears.

It seems to be isolated to llama-cpp-python as I'm able to have a stable conversational flow using llama.cpp on its own while utilizing a solid template structure.

While removing the alleged special start token appears to fix the issue, delving a bit deeper into it reveals itself to be a bit more problematic.

The model will start attempting to "fill in the gaps" by predicting the special tokens as a result.

My intuition is that it's due to the compartmentalization of the setup and modification of the template structure which makes it tricky to isolate.

I think it should be noted that there also seems to be a performance issue with this approach as well, slowing it down due to repeatedly modifying the conversational structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configurable chat templates
4 participants