-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement --single_prompt mode to use dir-assistant as part of the workflow #20
Implement --single_prompt mode to use dir-assistant as part of the workflow #20
Conversation
Awesome work. I would like to expand on this a bit before merging it however. The main problem with this methodology is the index startup and model load need to happen for every run. I'd like to instead have a client/server approach where you run I'll leave this PR open, as this is good work you did. Would you be interested in working on the features I mentioned above? The way I'd do the server is a flask HTTP server, multiprocessing for a separate python instance for dir-assistant's main thread, and input/output queues between the processes. |
@curvedinf I can explain my use case and why I opted out of having a server inside dir-assistant. I use LM Studio or Ollama. They manage my LLM models. I have multiple computers with OpenAI-compatible servers. The most important part is that I don't want to expose dir-assistant to network, rather than I login into my machine that has I don't want dir-assistant to gather context information across 20 of my projects, it will take more than 900 000+ tokens and 30 minutes to respond. The current implementation already does what it has to do - I send a prompt and a list of folders and I receive a response back. Here are logs of time execution
and client side
server
dir-assistant start-up time is not an issue, LLM response for long context is. |
Here is an additional log of the same request above of only sever requests to show you how much time it takes on a server. First request CGRAG
Request finished
Second request - embeddings (3 seconds)
Finished
From first to last request - 51 seconds. In my case I already have the embeddings cached for all of my projects, and LLM generation time is around 99% of the total time for context >32-128k. |
…s not called without 'start' command
I can see the need for a non-IP based solution (security and other reasons). Let me hash it over a bit. It may be best to have this and the client/server version. Regarding startup not being a concern, I have run larger models on huge filesets and startup can get pretty long, so it is a concern for those instances. Also APIs have low time to first token because they are parallelized, and you can opt to use smaller API models. In those instances, generation time can be short even on large contexts, which means you can end up spending most of your time in startup in certain situations. BTW I was suggesting in your hypothetical case with the server, you use dir-assistant to provide an API and consume an API. |
Just a small note regarding long startup time. I have long startup time only if the embedding hasn't been created for a specific file before. That's not my case, as the projects don't change that often, so I have to update maybe 10 files per day. The current PR provides a solution, so that |
I'm pulling into the 'single-prompt' branch to test and polish. If you have additional changes, make a PR to that branch. |
@curvedinf thank you very much! I appreciate that! I've tried to fix as many bugs as I could for my workflow. If you see any bugs, please let me know and I'll try to fix them in my free time. I'll share a snippet with you below in case you find it useful. I'll be testing this this week.
|
Implement
--single_prompt
mode to use dir-assistant as part of the workflow.PRINT_CGRAG=0 OPENAI_API_BASE=http://192.168.0.130:1234/v1 LM_STUDIO_API_KEY="ollama" dir-assistant start --single-prompt "what is dir-assistant?"
Please test on your system as well. I plan to use dir-assistant as a part of the workflow e.g. as a tool, not as a standalone app.
This feature implementation allows to get output from the dir-assistant and hide debug information which should not be visible.
Given that I had to add dynamic configuration options, a workflow can now specify properties like
PRINT_CGRAG=0
and other like context length.Some questions for small repos don't require full context, but if question is asked across multiple repos, the context can be dynamically changed.
I haven't tested yet, but it should be possible to change LLM as well by passing
LITELLM_MODEL
env parameter.