-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Garbled characters with beam search #215
Comments
we have fixed it on this pr |
@a32543254 It does get fixed in single generate call. But for the cont. batching in ModelServer, the issue still exists. Here is the log after running the test_model_server.py. =======REFERENCE RESULTS FOR COMPARISON========= Some of the most popular animals that people tend to mention as their favorites include:
|
Hi, @jiafuzha, sorry for the late response.
So, I think this issue is more like a |
@zhentaoyu thanks for the detailed response. I just got some new things to share with you.
"What's your favorite animal? 🐰🐶🐱🐷 My favorite animal is the penguin! 🐧 I think they're so cute and funny, and they're great" tokens:
[1, 1724, 29915, 29879, 596, 25448, 13019, 29973, 29871, 243, 162, 147, 179, 243, 162, 147, 185, 243, 162, 147, 180, 243, 162, 147, 186, 243, 162, 147, 183, 243, 162, 147, 184, 243, 162, 147, 185, 243, 162, 147, 180, 243, 162, 147, 186, 243, 162, 147, 183, 243, 162, 147, 184, 243, 162, 147, 185, 243] |
By the way, another case of garbled character is with prompt, 'what's your favorite food?'. vanilla transformers: My favorite food is pizza. I love the combination of the crispy crust, tangy tomato sauce, and melted mozzarella cheese. It's the perfect comfort food. |
|
NS result is from "model.init(model_name, use_quant=True, weight_dtype="int4", compute_dtype="int8")". |
I see. You can use |
yes, with fp32, I can get correct result from ns. I also tried below code from https://huggingface.co/docs/transformers/main/en/quantization. It looks like also weight only quant and gives me correct result. `from transformers import AutoModelForCausalLM, AutoTokenizer, QuantoConfig model_id = "facebook/opt-125m" |
Hi, @jiafuzha, it's different model_id and weight dtype. @a32543254 Does NS has some difference in RTN quant when compared to ITREX? I found the pipeline |
sorry, I copied wrong code. I was actually using ,
I got |
@zhentaoyu @a32543254 any more comments? |
Hi, @jiafuzha, our |
any update on this? |
Hi, @jiafuzha, sorry for late response. We are tied up with other things recently. We will dig into it and will let you know if we have any findings. Thanks a lot. |
no worries, looking forward to your fix. |
`
model_name = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = Model()
model.init(model_name, use_quant=True, weight_dtype="int4", compute_dtype="int8")
tokens = tokenizer("What's your favorite animal?", return_tensors='pt').input_ids
outputs = model.generate(tokens, num_beams=2, do_sample=False, max_new_tokens=10)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text)
`
With above code, I got below garbled characters.
"What's your favorite animal? ���������"
If I generate without beam search, I can get expected result.
outputs = model.generate(tokens)
"What's your favorite animal?
everybody has a favorite animal, and it's a"
The text was updated successfully, but these errors were encountered: