-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[server] Update server routes to be compliant with MLServer #1237
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, clean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - as mentioned before we need to sync w/ QA before landing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it would cause breaking changes to any application built using the old endpoint structure. I think if you could make a README or document showing how to transition current examples of usage, that would be helpful for other teams and users to have a summary of the changes.
For instance, how should we update this Digital Ocean getting started guide? https://marketplace.digitalocean.com/apps/deepsparse-inference-runtime
* refactor server for different integrations; additional functionality for chat completion streaming and non streaming * further refactor server * add support such that openai can host multiple models * update all tests * fix output for n > 1 * add inline comment explaining ProxyPipeline * [server] Update OpenAI Model Support (#1300) * update server * allow users to send requests with new models * use v1; move around baseroutes * add openai path * PR comments * clean-up output classes to be dataclasses, add docstrings, cleanup generation kwargs
Summary:
Testing
Sample Config:
This will now produce the following endpoints: