How can I switch to local LLM engine #43

oppokui · 2023-09-19T03:17:06Z

In Chat UI, there is a long list of LLM model. The default one is GPT 3.5 Turbo, which is openAI as I guess.
I configure openAI api key in .env, so it should be used, as the answer is very fast.

When I try to switch it to Llama 7B, it report:

An error occurred while generating text: Model llama-7b-GGML is currently booting.

I setup another llm engine "vllm" based on llama-2-7b-chat model, and expose in port 3000, it is compatible with openAI API.
how can I configure it to use this new engine?

The text was updated successfully, but these errors were encountered:

c0sogi · 2023-09-19T04:56:06Z

The model name will be facebook/opt-125m for example purposes.

In ./app/models/llms.py, find the LLMModels class.
Then try adding this to the class members and reboot.

     my_model = OpenAIModel(
         name="facebook/opt-125m",
         max_total_tokens=4096;
         max_tokens_per_request=4096;
         token_margin=10;
         tokenizer=OpenAITokenizer("gpt-3.5-turbo"),
         api_url="http://localhost:3000/v1/chat/completions",
     )

In the case of tokenizer, since tiktoken(OpenAITokenizer) is used, token counting will not be accurate compared to vllm which uses llama tokenizer. However, if the vllm server side handles the token limit exceeding error well, you will be able to use it without any problems.

oppokui · 2023-09-19T09:16:50Z

It works! But meet one problem when uploading a txt file for similarity search.

I use the script to start vllm engine remotely:

pip install git+https://github.com/vllm-project/vllm.git
pip install fschat
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-2-7b-chat-hf \
  --tensor-parallel-size 4 \
  --trust-remote-code \
  --gpu-memory-utilization 0.5 \
  --dtype half \
  --host 0.0.0.0 \
  --port 3000

Then enrich llms.py: (the max tokens can't be 4096)

    llama_2_7b_vllm = OpenAIModel(
        name="meta-llama/Llama-2-7b-chat-hf",
        max_total_tokens=2048,
        max_tokens_per_request=2048,
        token_margin=8,
        tokenizer=OpenAITokenizer("gpt-4"),
        api_url="http://ec2-18-211-48-230.compute-1.amazonaws.com:3000/v1/chat/completions",
        api_key=OPENAI_API_KEY,
    )

I can ask "who are you?" to remote llama engine, and it respond as llama.

Then I want to upload txt file, it hang there. It works if I use gpt-3.5 or gpt4.
The api container print logs like:

api_1          | [2023-09-19 09:00:28,729] ApiLogger:CRITICAL - 🦙 Llama.cpp server is running
api_1          | INFO:     ('172.16.0.1', 59542) - "WebSocket /ws/chat/daad0289-fc57-4e88-ada1-82052b94-d334-485d-a975-d386a605efd8" [accepted]
api_1          | INFO:     connection open
api_1          | - DEBUG: Calling command: retry with 0 args and ['buffer'] kwargs
api_1          | - DEBUG: remaining_tokens: 1528
api_1          | - DEBUG: Sending messages: 
api_1          | [
api_1          |   {
api_1          |     "role": "user",
api_1          |     "content": "who are you?"
api_1          |   }
api_1          | ]
api_1          | - DEBUG: Sending functions: None
api_1          | - DEBUG: Sending function_call: None
api_1          | Loading tokenizer:  gpt-4
api_1          | INFO:     172.16.0.1:48092 - "GET /assets/assets/lotties/file-upload.json HTTP/1.1" 200 OK
api_1          | Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/engines/text-embedding-ada-002/embeddings (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fde88725850>, 'Connection to api.openai.com timed out. (connect timeout=600)')).

oppokui · 2023-09-19T09:19:00Z

Oh, I realized the error is related to openai.com access. I can't access it in local machine, let me retry it in AWS instance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I switch to local LLM engine #43

How can I switch to local LLM engine #43

oppokui commented Sep 19, 2023

c0sogi commented Sep 19, 2023 •

edited

Loading

oppokui commented Sep 19, 2023

oppokui commented Sep 19, 2023

How can I switch to local LLM engine #43

How can I switch to local LLM engine #43

Comments

oppokui commented Sep 19, 2023

c0sogi commented Sep 19, 2023 • edited Loading

oppokui commented Sep 19, 2023

oppokui commented Sep 19, 2023

c0sogi commented Sep 19, 2023 •

edited

Loading