This repository provides a local deployment setup for a Large Language Model (LLM) using Docker and Docker Compose. The setup uses Ollama LLM with Open Web UI. NOTE: All information & artifacts related to this deployment are local.
To run locally:
- Run
make start
at the root of this project. This will set up Open Web UI for chatting and is deployed with thellama3
model. To setup this stack with a different LLM like saygemma2
, issue this commandmake start LOCAL_LLM=gemma2
NOTE: If you get an error while pulling the Open Web UI images, you might need to set up the Github container registry. Follow this instruction:
- To get access token go to Github Settings > Developer Settings > Personal access tokens
- From your terminal run
echo YOUR_PERSONAL_ACCESS_TOKEN | docker login ghcr.io -u YOUR_GITHUB_USERNAME --password-stdin
- You should now be authenticated to pull images from the GitHub Container registry.
- To access the chat UI, go to
http://localhost:3000
- Sign up with a local account and log in (Login information is local to the host machine).
- Select a model to use for chat.
- Chat away :)
Ollama exposes some endpoints that one can use to interact with the LLM model. Some examples are below:
Chat Endpoint
$ curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt": "Why is the sky blue?"
}'
Pull Model
One can use multiple models in the Open Web UI. To pull a model run
curl http://localhost:11434/api/pull -d '{"name": "llama3"}'
Check Models Available locally
curl http://localhost:11434/api/tags