-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for logprobs in OpenAI chat API #852
Add support for logprobs in OpenAI chat API #852
Conversation
cc @merrymercy @hnyls2002 @Ying1123 @zhyncs for review, I think it is ready to merge |
I've just tested the code from this branch and encountered an error:
|
@isukharev Computing logprobs take more memory, try to reduce |
@hnyls2002 Thanks! After reducing |
@isukharev When you want to get the logprob of the prompt, the prefix radix cache will be turned off. This is because we only cache the KV cache, not the logits. Do you need logprob for prompts or do you only need the logprob for generation? If you only need the logprob for generation, we can implement an additional flag just for your use case. Then you should see a similar cache hit rate. |
@Ying1123 Only need this for generation, we are using LLM as a CrossEncoder as shown here: |
@yichuan520030910320 @hnyls2002 It should be easy to do. Can you implement this feature? |
Thank you for your contribution, we really appreciate it. The following instructions will help improve your pull request and make it easier to receive feedback. If there are any items you don't understand, don't worry. Just submit the pull request and ask the maintainers for help.
Motivation
Fix #839
Modification
Add support for logprobs in OpenAI chat API and cater real OAI API output
Checklist
pre-commit run --all-files
or other linting tools are used to fix potential lint issues.