-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: Completion of pre-tokenized prompt is broken #4476
Comments
It took me sometime to realize what's wrong with my server. I originally added the prompt array support for testing prompts with specifically selected tokens, which has been quite useful, as the prompts subject no constraint of any tokenizer. In order to support both usage, how about allowing 2-level nested prompts? For example,
This would be compatible with the multi-prompt change already introduced, and allow for array prompts. |
I'm baffled. I can't get the current multi-prompt to work. #4583 I'll wait for that to be fixed and introduce new behaviors. For now, using #4232 (comment) to get back the previous behavior. |
I would like it better if multi-prompt field was called "prompts". It can then have sub-arrays as in your example, @jxy. The format of "prompt" field can be made to match the current documentation, i.e. single prompt. |
This issue is stale because it has been open for 30 days with no activity. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
According to documentation:
llama.cpp/examples/server/README.md
Line 117 in cafcd4f
Current Behavior
When supplying the prompt as array of token identifiers, it instead calls
split_multiprompt_task
and the request hangs.Steps to Reproduce
content
.prompt
.Failure Logs
slot 0 is processing [task id: 2]
slot unavailable
print_timings: prompt eval time = 0.00 ms / 0 tokens ( -nan ms per token, -nan tokens per second)
print_timings: eval time = -94366367288.92 ms / 0 runs ( -inf ms per token, -0.00 tokens per second)
print_timings: total time = -94366367288.92 ms
slot unavailable
The text was updated successfully, but these errors were encountered: