-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable Chat Formats #711
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an excellent idea, and very necessary to use the python module
llama_cpp/llama.py
Outdated
@@ -1093,7 +1104,9 @@ def _create_completion( | |||
# all remaining tokens cannot be decoded to a UTF-8 character | |||
break | |||
token_end_position += len(bs) | |||
if token_end_position > (remaining_length - first_stop_position): | |||
if token_end_position > ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably a good idea to reduce the number of formatting changes in a PR
Problem StatementWhen using the Llama library for conversational models, particularly with the
Desired Behavior
Investigation and SolutionHelper Functionsdef _get_system_message(messages: List[llama_types.ChatCompletionRequestMessage]) -> str:
for message in messages:
if message["role"] == "system":
return message["content"] or ""
return ""
def _map_roles(messages: List[llama_types.ChatCompletionRequestMessage], role_map: Dict[str, str]) -> List[Tuple[str, Optional[str]]]:
output: List[Tuple[str, Optional[str]]] = []
for message in messages:
role = message["role"]
if role in role_map:
output.append((role_map[role], message["content"]))
return output Main Function for Formatting@register_chat_format("llama-2")
def format_llama2(
messages: List[llama_types.ChatCompletionRequestMessage],
**kwargs: Any,
) -> ChatFormatterResponse:
_system_template = "<<SYS>>\n{system_message}\n<</SYS>>\n\n"
_roles = dict(user="[INST]", assistant="[/INST]")
_sep = "\n\n"
system_message = _get_system_message(messages)
system_message = _system_template.format(system_message=system_message)
_messages = _map_roles(messages, _roles)
_messages.append((_roles["assistant"], None))
_prompt = _format_llama2(system_message, _messages, _sep)
return ChatFormatterResponse(prompt=_prompt) Key Changes
Result
|
I'm not sure whether this is the proper place to ask but I also encountered repeated greeting in a chat session. Is this issue already fixed by this merged PR? I updated to latest main branch but seems the problem is still there. Following is a conversation with llama 7b
|
I'm still waiting on a response. This issue is a bit more complicated than it appears. It seems to be isolated to llama-cpp-python as I'm able to have a stable conversational flow using llama.cpp on its own while utilizing a solid template structure. While removing the alleged special start token appears to fix the issue, delving a bit deeper into it reveals itself to be a bit more problematic. The model will start attempting to "fill in the gaps" by predicting the special tokens as a result. My intuition is that it's due to the compartmentalization of the setup and modification of the template structure which makes it tricky to isolate. I think it should be noted that there also seems to be a performance issue with this approach as well, slowing it down due to repeatedly modifying the conversational structure. |
@teleprint-me Thanks for answer the issue. The issue is quite frustrating while I tried to build a local chatbot. I see this is a reply to a closed PR. Can you open an issue for this strange behavior? Might help it better tracked.
|
Adds the ability to specify different chat formats. Closes #492
Handle grammars for function calling formats (functionary)API
References