-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phi-3-mini-4k-instruct-onnx model generates nonsensical results when prompt is longer than half of the context window length #552
Comments
I have the same issue. I get similar nonsensical output when using either the 4k-model or the 128k-model via the Onnx-Runtime with a long user prompt and setting the search option max_length to 3000. When using a shorter user prompt the output is as expected. Using the same user prompt on https://ai.azure.com/explore/models/Phi-3-mini-128k- Tried with versions 0.2.0 and 0.3.0-rc2 of the library Microsoft.ML.OnnxRuntimeGenAI.DirectML. P.S.: Thank you for this nice library, it is awesome to be able to run a SLM locally that easily. |
A quick follow-up to this issue. I would really appreciate any help or insights on this issue! |
I am seeing a similar issue when running on CPU |
Hi @jackylu0124, @bkaruman, @AMehlem, I have reproduced your issue on CPU. We will investigate. |
Hi @natke, thank you for the update, I appreciate it. |
I get the same issue if the conversation history goes above 2k (approximately) using .NET nugget package Microsoft.ML.OnnxRuntimeGenAI version 0.3.0. I am using the Phi-3-mini-4k-instruct-onnx. As a work around I truncate the conversation history (MaxTokenLength which is 4096 - 2500) and the issue goes away. I don't truncate the tokens but rather the conversation that the limit falls under. EDIT: I can reproduce this with |
This should be resolved with the PR: #802 |
Hello @baijumeswani, thank you for the update does this mean that we have to regenerate our onnx models? I also have used the model builder for the Phi-3.5 model. |
Hello @MaxAkbar, I tested the latest onnxruntime-genai package with Phi-3.5 model, and it works fine when the input prompt is greater than 2k. Can you please try that? |
Thank you @apsonawane , I also created an onnx model but will run some tests this weekend. |
I've been experiencing a similar issue when fine-tuning phi3 and phi3.5 models as well. Produces a lot of tokens at the end that are gibberish even after fine-tuning. Looks like this has been solved with the latest ONNX release, but fine-tuning these ONNX models by converting them to torch is really tricky. Any solution for that? |
I am running the
Phi-3-mini-4k-instruct-onnx
model on desktop CPU, and one behavior I have noticed is that, after the back and forth conversation is longer than half of the context window length (in other words, longer than 2048 tokens), the model starts generating nonsensical and unreadable results. This is weird because I would expect the model to continue generating readable results at least to the point that's closer to the end of the context window length. I would really appreciate any insights on this problematic behavior as well as any ways that I can fix this issue. Thanks a lot in advance!To reproduce this issue, you can run the example
phi3-qa.py
script (https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi3-qa.py) (with command likepython phi3-qa.py -m .\Phi-3-mini-4k-instruct-onnx\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4
) with the if-block settingsearch_options['max_length'] = 2048
commented out to allow longer input and also replace theprompt
in the lineinput_tokens = tokenizer.encode(prompt)
(onnxruntime-genai/examples/python/phi3-qa.py
Line 38 in 6be8835
For greater readability, the conversation string above corresponds the following dialog:
and then the model will generate the following nonsensical/unreadable output:
The text was updated successfully, but these errors were encountered: