-
-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instructions for running it with local models is lacking. #943
Comments
As a quick update for the community, we are actively working on this issue and experimenting with using several local models to see how well they can work with gpt-engineer. After that, based on our experiments, we will update the documentation with relevant info. |
I just got docker container working transparently using a dummy cloudflare hosted external address. On the server a combination of ollama serve simulates openai and in the .env file add OPENAI_API_BASE=https://ai.mydomain.com I experimentally pointed ollama at mistral and code llama - memory seems to produce code but no files are written so far. Cloudflare tunnel is only because my implementation of docker does not recognize host.docker.internal and I didn't want to add a network to the docker compose. But you could modify the docker compose to point to the hosts localhost:8000 if your setup supports that. |
Hey. As discussed a proposal the local LLM support. Please provide feedback before I dive in.
Requirements:
Support for
|
@ATheorell @captivus I'm fine with @zigabrencic's proposal. Are you guys okay with it, too, or you'd like to change something? |
I know too little about this to have a strong opinion. What does @AntonOsika say? I want to add that this is a priority issue to me and it is clear that we need to have at least one example of setting this up in our own docs that we maintain, so that we can refer users to a text we know is true whenever this question comes up, which happens frequently. |
This looks good to me. Thanks for picking this up @zigabrencic! |
I am rather late to notice progress here but I have a few caveats to add which may or may not be pertinent. GPTE is working brilliantly for me so far - just diving into improving the workflow and hope to send some suggestions upstream . It was annoying for me that gpte didn't work "out of the box" but the solution is out there already. 1:/ I personally think the langchain code base is buggy and it keeps shifting, I wouldn't rely on it personally - I had many nightmares with certain "impossible" situations due to introoduced breaking changes. See more here https://docs.litellm.ai/docs/ brgds James |
Hey @definitiontv Thanks for all the inputs. 1.) I experienced sth similar when trying to add open LLM's using langchain so far and am worried on the same front. Since they(langchain) try to be everything lib. 2.) Do you mean by "older" methods here PyTorch, TensorFlow? Or also tools like llama.cpp? 3.) Good point. 4.) & 5.) This sounds like what we need. Could you provide us maybe with your working version code/setup? So we don't re-invent the wheel? If you have a working docker-compose please share it so we can build on top of your solution. Extra from me: 6.) How do you find the inference speed with the stack of ollama and lite-llm you described? Fast/slow? Cheers Ziga |
Yes, but to be honest I am finding that many(most?) LLM open source implementations are annoyingly opinionated about their assumptions on user setup ..it's new.... consensus hasn't settled in.
I was just going to quickly copy my setup - then realised multiple gotchas that I fixed local to me (happy to send to you direct (how???) but don't wanna steer people wrong on the thread)
I run on a free tier ARM CPU remote processor -using only models needing 5,6GB (mostly mistral 7b) I get nice fast inference
|
The question raised by @zigabrencic as #6 is a particularly important one. We will want to compare inference performance of proposed solutions prior to implementing, given issues observed in other testing Ziga and I have been working on. |
Sorry - I have been rushing - Bit more complex than i thought so I had to wrap it in a pull request in order to share. AHAHH, as I type this I see my error!!! commands are the wrong way round see #1015 some notes added to the bottom of the exising docker/README I will expand /notes etc once a few options are pinned down |
Hey. Thanks for submitting this. I checked the PR. Have only one question: How's the performance and underlying hardware access in docker? For further chat's I propose that you reach out on http://discord.com/users/820749115197227138 so we can speed this up a little ;) |
Well if you are relying on GPU then you just need to make sure your docker instance has access which it shoudl unless you are running a very weird setup. CPU and memory should of course be near native - that's the point od docker. |
Hey 1.) Thanks for the GPU/CPU point. Must admit I'm not that familiar with docker internals. 2.) Discord. Strange. Works for me and others. How about this one: https://discord.com/channels/1119885301872070706/1120698764445880350 that's the link to the community. And find me there under |
Policy and info
Description
Instructions:
Running the Example
Once the API is set up, you can find the host and the exposed TCP port by checking your Runpod dashboard.
Then, you can use the port and host to run the following example using WizardCoder-Python-34B hosted on Runpod:
OPENAI_API_BASE=http://:/v1 python -m gpt_engineer.cli.main benchmark/pomodoro_timer --steps benchmark TheBloke_WizardCoder-Python-34B-V1.0-GPTQ
What is this example? What does it do? Whats gpt_engineer.cli.main?
How do i run the main command "gpte projects/my-new-project" after i have a local llm runing on localhost:8000?
Suggestion
Please provide more step by step instructions.
The text was updated successfully, but these errors were encountered: