Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TextGeneration] Fix llama tokenizer #1635

Merged
merged 4 commits into from
Mar 14, 2024
Merged

[TextGeneration] Fix llama tokenizer #1635

merged 4 commits into from
Mar 14, 2024

Conversation

dsikka
Copy link
Contributor

@dsikka dsikka commented Mar 14, 2024

Tested code:

import deepsparse

MODEL_ID = "hf:nm-testing/llama2-7B-sparse70-retrained-ultrachat200k-pruned70-smoothquant-ds"
#MODEL_ID = "zoo:mistral-7b-ultrachat200k_mistral_pretrain-pruned40_quantized"

pipe = deepsparse.Pipeline.create(
    task="text-generation",
    model_path=MODEL_ID,
    sequence_length=512,
    prompt_sequence_length=16,
)

message = "Once upon a time"

conversation = []
conversation.append({"role": "user", "content": message})
formatted_conversation = pipe.tokenizer.apply_chat_template(
    conversation, tokenize=False, add_generation_prompt=True
)

generation_config = {
    "max_new_tokens": 100,
}

inference = pipe(
    sequences=formatted_conversation,
    generation_config=generation_config,
    streaming=True,
)

for token in inference:
    print(token.generations[0].text, end="")

Output:


There was a time when the world was a different place. A time when people were more accepting of each other and didn't judge based on race, religion, or gender. A time when kindness and compassion were the norm, and hate and prejudice were unheard of.

But then something changed. The world became more divided, and people started to see each other through a

@dsikka dsikka requested a review from mgoin March 14, 2024 21:39
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for answering my questions, nice implementation

@mgoin mgoin merged commit 9bac61e into main Mar 14, 2024
13 checks passed
@mgoin mgoin deleted the fix_llama_tokenizer branch March 14, 2024 21:51
dhuangnm pushed a commit that referenced this pull request Mar 14, 2024
* add llama tokenizer fix

* fix generated string

* only run for streaming

* add TODO

---------

Co-authored-by: Dipika Sikka <[email protected]>
Copy link
Contributor

@dbogunowicz dbogunowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test would be nice to have, but I guess the priority is to land this asap.

dhuangnm added a commit that referenced this pull request Mar 18, 2024
* [TextGeneration] Fix llama tokenizer (#1635)

* add llama tokenizer fix

* fix generated string

* only run for streaming

* add TODO

---------

Co-authored-by: Dipika Sikka <[email protected]>

* Retire `flaky` in favour of `pytest-rerunfailures` (#1628)

* pick up another fix and bump up version to 1.7.1

---------

Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: dbogunowicz <[email protected]>
Co-authored-by: dhuang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants