-
Notifications
You must be signed in to change notification settings - Fork 38
Garbled characters with beam search #215
Comments
we have fixed it on this pr |
@a32543254 It does get fixed in single generate call. But for the cont. batching in ModelServer, the issue still exists. Here is the log after running the test_model_server.py. =======REFERENCE RESULTS FOR COMPARISON========= Some of the most popular animals that people tend to mention as their favorites include:
|
Hi, @jiafuzha, sorry for the late response.
So, I think this issue is more like a |
@zhentaoyu thanks for the detailed response. I just got some new things to share with you.
"What's your favorite animal? 🐰🐶🐱🐷 My favorite animal is the penguin! 🐧 I think they're so cute and funny, and they're great" tokens:
[1, 1724, 29915, 29879, 596, 25448, 13019, 29973, 29871, 243, 162, 147, 179, 243, 162, 147, 185, 243, 162, 147, 180, 243, 162, 147, 186, 243, 162, 147, 183, 243, 162, 147, 184, 243, 162, 147, 185, 243, 162, 147, 180, 243, 162, 147, 186, 243, 162, 147, 183, 243, 162, 147, 184, 243, 162, 147, 185, 243] |
By the way, another case of garbled character is with prompt, 'what's your favorite food?'. vanilla transformers: My favorite food is pizza. I love the combination of the crispy crust, tangy tomato sauce, and melted mozzarella cheese. It's the perfect comfort food. |
|
NS result is from "model.init(model_name, use_quant=True, weight_dtype="int4", compute_dtype="int8")". |
I see. You can use |
yes, with fp32, I can get correct result from ns. I also tried below code from https://huggingface.co/docs/transformers/main/en/quantization. It looks like also weight only quant and gives me correct result. `from transformers import AutoModelForCausalLM, AutoTokenizer, QuantoConfig model_id = "facebook/opt-125m" |
Hi, @jiafuzha, it's different model_id and weight dtype. @a32543254 Does NS has some difference in RTN quant when compared to ITREX? I found the pipeline |
sorry, I copied wrong code. I was actually using ,
I got |
@zhentaoyu @a32543254 any more comments? |
Hi, @jiafuzha, our |
any update on this? |
Hi, @jiafuzha, sorry for late response. We are tied up with other things recently. We will dig into it and will let you know if we have any findings. Thanks a lot. |
no worries, looking forward to your fix. |
`
model_name = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = Model()
model.init(model_name, use_quant=True, weight_dtype="int4", compute_dtype="int8")
tokens = tokenizer("What's your favorite animal?", return_tensors='pt').input_ids
outputs = model.generate(tokens, num_beams=2, do_sample=False, max_new_tokens=10)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text)
`
With above code, I got below garbled characters.
"What's your favorite animal? ���������"
If I generate without beam search, I can get expected result.
outputs = model.generate(tokens)
"What's your favorite animal?
everybody has a favorite animal, and it's a"
The text was updated successfully, but these errors were encountered: