Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: models endpoint support in dllama-api #146

Open
jkeegan opened this issue Dec 24, 2024 · 0 comments
Open

Feature request: models endpoint support in dllama-api #146

jkeegan opened this issue Dec 24, 2024 · 0 comments

Comments

@jkeegan
Copy link
Contributor

jkeegan commented Dec 24, 2024

I’ve played around with this a bit but wanted to list out the idea here in case someone can implement it sooner. The idea is neither novel nor particularly interesting, but I think it’s needed: dllama-api should support the models endpoint and allow inference against multiple models from a pre-specified list.

Yeah we’re resource constrained on local machines, so the first completion of a model might take dozens of seconds as the model is loaded into place.. and you wouldn’t want multiple tools pulling from different models interleaved with each other because you’d thrash between models in memory etc.

..but the ability to start dllama-api in “here is a list of valid models” mode (maybe just derived from some models directory, or possibly via command line args or a config file), so that the models endpoint could be correctly queried, and a completion task on that model could then succeed, would mean that we could leave a dllama-api server running all the time and access whichever we want, rather than bringing down dllama-api, starting up a new one, etc each time.

Right now I can run LMStudio’s server for this for models that fit in memory, but we should be able to do this distributed as well.

A good test of this being completed would be to try getting open-webui to work with dllama-api. I’ve tried adding in dummy model responses but can’t get open-webui to successfully talk to distributed-llama.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant