Feature request: models endpoint support in dllama-api #146

jkeegan · 2024-12-24T07:00:46Z

I’ve played around with this a bit but wanted to list out the idea here in case someone can implement it sooner. The idea is neither novel nor particularly interesting, but I think it’s needed: dllama-api should support the models endpoint and allow inference against multiple models from a pre-specified list.

Yeah we’re resource constrained on local machines, so the first completion of a model might take dozens of seconds as the model is loaded into place.. and you wouldn’t want multiple tools pulling from different models interleaved with each other because you’d thrash between models in memory etc.

..but the ability to start dllama-api in “here is a list of valid models” mode (maybe just derived from some models directory, or possibly via command line args or a config file), so that the models endpoint could be correctly queried, and a completion task on that model could then succeed, would mean that we could leave a dllama-api server running all the time and access whichever we want, rather than bringing down dllama-api, starting up a new one, etc each time.

Right now I can run LMStudio’s server for this for models that fit in memory, but we should be able to do this distributed as well.

A good test of this being completed would be to try getting open-webui to work with dllama-api. I’ve tried adding in dummy model responses but can’t get open-webui to successfully talk to distributed-llama.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: models endpoint support in dllama-api #146

Feature request: models endpoint support in dllama-api #146

jkeegan commented Dec 24, 2024

Feature request: models endpoint support in dllama-api #146

Feature request: models endpoint support in dllama-api #146

Comments

jkeegan commented Dec 24, 2024