Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Update Llama2 Formatting Template #781

Closed
wants to merge 1 commit into from
Closed

fix: Update Llama2 Formatting Template #781

wants to merge 1 commit into from

Conversation

teleprint-me
Copy link
Contributor

I found a bug in the llama-2 formatting template.

  • Modify _system_template to better align system messages
  • Remove [INST] tag from system message format

This change aims to fix the repetitive contextual issues in the conversational context by refining the message template used for Llama2.

- Modify _system_template to better align system messages
- Remove [INST] tag from system message format

This change aims to fix the repetitive greeting issue in the conversational context by refining the message template used for Llama2.
@teleprint-me
Copy link
Contributor Author

I posted details about this issue in reference to PR #711.

@abetlen
Copy link
Owner

abetlen commented Oct 2, 2023

Hey @teleprint-me do you have a reference for this being the correct prompt format? I originally based this off of FastChat's system message which references a Huggingface blogpost that shows the current format:

<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>

{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s><s>[INST] {{ user_msg_2 }} [/INST]

@teleprint-me
Copy link
Contributor Author

teleprint-me commented Oct 2, 2023

@abetlen I don't have any references other than manually testing it repeatedly myself. It took awhile and I'll admit that I became frustrated by it until I figured it out.

I had to figure it out manually, by hand, using llama.cpp. I came up with a template that I extracted with ChatGPT from the official Llama source code.

The file template GPT and I came up with is as follows:

<<SYS>>
My name is Valerie and I am a helpful assistant.
<</SYS>>

[INST] Hello Valerie, my name is Austin. What's your name?
[/INST] Hello Austin, my name is Valerie. Nice to meet you!
[INST] What can you do?
[/INST] I can assist you with various tasks, from answering questions to helping you with your projects.
[INST] How can you assist me in my programming projects?
[/INST]

The results were the following:

12:28:41 | /mnt/valerie/llama.cpp
(.venv) git:(master | Δ) λ ./main -f prompts/llama2.txt -m models/llama-2/llama-2-7b-chat/llama-2-7b-chat.GGUF.q4_0.bin -i --log-disable 2> /dev/null     
 <<SYS>>
My name is Valerie and I am a helpful assistant.
<</SYS>>

[INST] Hello Valerie, my name is Austin. What's your name?
[/INST] Hello Austin, my name is Valerie. Nice to meet you!
[INST] What can you do?
[/INST] I can assist you with various tasks, from answering questions to helping you with your projects.
[INST] How can you assist me in my programming projects?
[/INST] I'm happy to help with your programming projects! Here are some ways I can assist:

1. Code review and debugging: I can analyze your code and identify any errors or areas for improvement.
2. Writing code: I can generate code snippets or complete programs based on your requirements.
3. Providing programming concepts explanations: I can explain complex programming concepts in a clear and concise manner, making it easier for you to understand them.
4. Offering suggestions and ideas: If you're stuck on a particular problem or project, I can provide suggestions and ideas that might help you move forward.
5. Language translation: I can translate code from one programming language to another, helping you to work with different languages.
6. Data analysis and visualization: I can assist with data manipulation, cleaning, and visualization, using various libraries and tools.
7. Web scraping and data extraction: I can help you extract data from websites or web pages using various techniques.
8. Chatbot development: I can assist with developing chatbots using various platforms and frameworks.
9. Content generation: I can generate content, such as articles, blog posts, or even entire books, based on your requirements.
10. Other tasks: Depending on your needs, I can perform other tasks such as scheduling appointments, sending emails, or making reservations. Just let me know how I can help!

I tested it multiple times and was able to reproduce the results consistently.

[INST] <<SYS>>
My name is Valerie and I am a helpful assistant.
<</SYS>>

Using the currently employed template produces erroneous output causing the model to "get confused" and "glitch out".

00:22:12 | /mnt/valerie/pygptprompt
(.venv) git:(main | Δ) λ python -m pygptprompt.chat -x -e -s test tests/config.dev.json
2023-10-02 00:22:15,318 - json.py:110 - DEBUG - JSON successfully loaded from tests/config.dev.json
system
My name is Valerie. I am a helpful programming assistant.

user
Hello Valerie, my name is Austin. What's your name?

assistant
Hello Austin, my name is Valerie. Nice to meet you!

user
What can you do?

assistant
I can assist you with various tasks, from answering questions to helping you with your projects.

user
> How can you assist me in my programming projects?

assistant
 Hello Austin! It's nice to meet you too! As a helpful programming assistant, I can help you with a variety of tasks related to programming. Here are some examples of things I can do:

1. Answer questions: If you have any questions about programming concepts or specific languages, I can try to provide explanations and examples to help you understand.
2. Provide code examples: If you're working on a project and need some example code to get started, I can provide that for you.
3. Offer troubleshooting assistance: If you're having trouble with your code or are encountering errors, I can try to help you identify the problem and suggest solutions.
4. Help with debugging: If you're having trouble tracking down a bug in your code, I can offer suggestions on how to debug and fix the issue.
5. Assist with project ideas: If you're looking for ideas for programming projects, I can provide some suggestions based on your interests and skill level.

So, what kind of programming projects are you working on right now? Do you have any specific questions or areas where you could use some help?

Context is consuming 346 tokens.
Transcript is consuming 777 tokens.

user
>

The template currently employed produces undesirable output as a result instead.

Examples of undesirable output are:

  • Repetitive outputs, e.g. the model repeats certain elements within a conversation (e.g., greetings, previous statements).
  • It will cause it to complete entire conversations instead of stopping and allowing the user/agent to respond. Etc.

You can repeat these experiments yourself and I'm sure you'll reproduce similar results.

If you're looking for something more concrete and official, then I would recommend reviewing Facebooks source code yourself.

@teleprint-me
Copy link
Contributor Author

As an aside, something interesting to note is that Llama is perfectly capable of understanding and interpreting custom special tags.

12:58:36 | /mnt/valerie/llama.cpp
(.venv) git:(master | Δ) λ ./main -f prompts/llama2.func.txt -m models/llama-2/llama-2-7b-chat/llama-2-7b-chat.GGUF.q4_0.bin -i --log-disable 2> /dev/null
 <<SYS>>
My name is Valerie and I am a helpful assistant.
<</SYS>>

[INST] Hello Valerie, my name is Austin. What's your name?[/INST]
[ASST] Hello Austin, my name is Valerie. Nice to meet you![/ASST]
[INST] What can you do?[/INST]
[ASST] I can assist you with various tasks, including providing structured output for certain queries.[/ASST]
[INST] How can you assist me in my programming projects?[/INST]
[ASST] I can help you with a wide range of programming-related tasks. For example, I can generate code, help with debugging, and offer project structuring advice.[/ASST]
[INST] What's the weather like?[/INST]
[FUNC]
{
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            },
            "unit": { 
              "type": "string", 
              "enum": ["metric", "uscs"],
              "description": "The unit system to use for temperature measurements"
            }
          },
          "required": ["location"]
        }
}
[/FUNC]
# llama starts after [ASST]█
[ASST]  Hello Austin, the current weather in San Francisco, CA is mostly sunny with a high of 78 degrees Fahrenheit and a low of 56 degrees Fahrenheit. Would you like to know the temperature in another location?

I know llama.cpp has a CFG implementation and you've also implemented it as well; I haven't had the time or resources to look at it yet even though I want to.

My rationale for bringing it up is that it would explain why the previous template(s) worked just fine.

Here's another example of something I was experimenting with:

13:01:50 | /mnt/valerie/llama.cpp
(.venv) git:(master | Δ) λ ./main -f prompts/llama2.func.txt -m models/llama-2/llama-2-7b-chat/llama-2-7b-chat.GGUF.q4_0.bin -i --log-disable 2> /dev/null
 <<SYS>>
My name is Valerie and I am a helpful assistant.
<</SYS>>

[INST] Hello Valerie, my name is Austin. What's your name?[/INST]
[ASST] Hello Austin, my name is Valerie. Nice to meet you![/ASST]
[INST] What can you do?[/INST]
[ASST] I can assist you with various tasks, including providing structured output for certain queries.[/ASST]
[INST] How can you assist me in my programming projects?[/INST]
[ASST] I can help you with a wide range of programming-related tasks. For example, I can generate code, help with debugging, and offer project structuring advice.[/ASST]
[INST] What's the weather like?[/INST]
[FUNC]
{
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            },
            "unit": { 
              "type": "string", 
              "enum": ["metric", "uscs"],
              "description": "The unit system to use for temperature measurements"
            }
          },
          "required": ["location"]
        }
}
[/FUNC]
# llama starts after [FUNC]█
[FUNC] 
{
    "name": "current_weather",
    "description": "Get the current weather in a given location",
    "parameters": {
      "location": {
        "type": "string",
        "description": "The city and state, e.g. San Francisco, CA"
      },
      "unit": { 
        "type": "string", 
        "enum": ["metric", "uscs"],
        "description": "The unit system to use for temperature measurements"
      }
    },
    "response": {
      "status": "success",
      "data": {
        "tempt": {
          "temperature": {
            "value": Math.floor(Math.random() * 10) + 10,
            "units": function () {
              if (this.unit === "metric") {
                return "°C";
              } else if (this.unit === "uscs") {
                return "°F";
              } else {
                throw new Error("Invalid unit system");
              }
            }
          },
          "description": "The current temperature in the selected location"
        }
      }
    }
}

Again, with minimal instruction, Llama interprets it just fine even after quantization, precision loss due to k-bit tensor conversions.

I think the smaller models are much more intelligent than we give them credit for. I believe this is significant, especially for end consumer devices.

@teleprint-me
Copy link
Contributor Author

Regardless, this is why I was advocating for a more flexible approach rather than the one currently implemented.

Using the factory pattern with fixed templates is a bit inflexible and requires unnecessary maintenance as a result.

I believe it would be better to allow the user to define their own templates instead of hard-coding them what-so-ever.

@bioshazard
Copy link
Contributor

bioshazard commented Oct 3, 2023

Have you looked at Huggingface Chat Templates? Maybe a universal chat templating solution rather than manual implementation? Mistral, eg, comes with a template config as part of the model deliverable.

https://huggingface.co/docs/transformers/main/chat_templating

eg,

>> from transformers import AutoTokenizer
>> tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

>> chat = [
  {"role": "user", "content": "Hello, how are you?"},
  {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
  {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

>> tokenizer.use_default_system_prompt = False
>> tokenizer.apply_chat_template(chat, tokenize=False)
"<s>[INST] Hello, how are you? [/INST] I'm doing great. How can I help you today? </s><s>[INST] I'd like to show off how chat templating works! [/INST]"

@teleprint-me
Copy link
Contributor Author

I did take a look at it already. It's the same template being used already for llama2 as well.

I think a good middle ground would be figuring out how to sanely register a new template with the current factory pattern.

It's set in the constructor at the moment. So, if it can take a string or instance, preferably an instance, according to the example snippet in PR #711, then we could just pass it in by inheriting from the base class. Wrap the new function with the decorator already defined as well. This would most likely introduce a form of dependency injection which I'm not sure is desirable.

My main hang up with this is that I'd rather iterate over the completions list my self in a function I can define. This would feel more intuitive design-wise and give users a level of fine grained control if they want or need it. The caveat is that this potentially creates its own problem of repeated code, hence the helper functions already implemented.

Like, I get the design choices... I just feel like it can be improved and hopefully become more flexible as a result. I'm still working out the details though.

I'd prefer to avoid going in circles. Whenever I get like this, it's usually a sign that I'm not interpreting something correctly and I'm still working my way towards a form of understanding and comprehension.

That's why I went for a minor change. I'd have to experiment more deeply with it in a separate branch while minimizing the overall effects.

@bioshazard
Copy link
Contributor

Thanks for addressing my question. I will let the HF topic go in this context and leave y'all to wrap up this generous contribution. I am going to open a fresh issue to pitch my thoughts about delivering support for both HF autotokenizer and arbitrary Jinja file path.

@teleprint-me
Copy link
Contributor Author

PR #808 fixes the issue.

@earonesty
Copy link
Contributor

it does not, fully, fix the issue. it only fixes a single back-and-forth. longer conversations with user/assistant, followed by a completion, still have problems.

@teleprint-me teleprint-me deleted the fix.llama2.template branch October 26, 2023 20:56
antoine-lizee pushed a commit to antoine-lizee/llama-cpp-python that referenced this pull request Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants