Skip to content

bionic-gpt/doc-llama

Repository files navigation

Docker Containers for a local OPEN AI compatible API

Open AI API compatible docker images. We select models and package them in high performance docker containers.

Image Name Parameters Type Context Size Image Size Quantization Ram Requirements
llama-2-7b-chat:1.1.3 7b LLama 2 2K 5GB Yes 16GB
llama-3.2-3b:1.1.3 7b LLama 3.2 2K 5GB Yes 16GB
embeddings:1.1.3 7b bge:small-en-v1.5-q8_0 2K 5GB Yes 16GB

Features

  • API implements batching meaning when multiple users make a request the API serves them in parallel.
  • The docker image contains the model, no needs to download the model in another step.
  • Images are optimized for different architectures [coming soon]

Startup the API

docker run -it --rm -p 3000:11434 ghcr.io/bionic-gpt/llama-3.2-3b:1.1.3:1.1.1

Try it out

curl http://localhost:3000/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "llama2", 
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.1 
   }'

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •