-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I switch to local LLM engine #43
Comments
The model name will be In my_model = OpenAIModel(
name="facebook/opt-125m",
max_total_tokens=4096;
max_tokens_per_request=4096;
token_margin=10;
tokenizer=OpenAITokenizer("gpt-3.5-turbo"),
api_url="http://localhost:3000/v1/chat/completions",
) In the case of tokenizer, since tiktoken(OpenAITokenizer) is used, token counting will not be accurate compared to vllm which uses llama tokenizer. However, if the vllm server side handles the token limit exceeding error well, you will be able to use it without any problems. |
It works! But meet one problem when uploading a txt file for similarity search. I use the script to start vllm engine remotely:
Then enrich llms.py: (the max tokens can't be 4096)
I can ask "who are you?" to remote llama engine, and it respond as llama. Then I want to upload txt file, it hang there. It works if I use gpt-3.5 or gpt4.
|
Oh, I realized the error is related to openai.com access. I can't access it in local machine, let me retry it in AWS instance. |
In Chat UI, there is a long list of LLM model. The default one is GPT 3.5 Turbo, which is openAI as I guess.
I configure openAI api key in .env, so it should be used, as the answer is very fast.
When I try to switch it to Llama 7B, it report:
I setup another llm engine "vllm" based on llama-2-7b-chat model, and expose in port 3000, it is compatible with openAI API.
how can I configure it to use this new engine?
The text was updated successfully, but these errors were encountered: