Inspired by Drew Breunig's Faucet Finder I decided I wanted to deploy an API somewhere that could calculate CLIP embeddings for me.
I ended up deploying a Datasette instance to Fly.io using my llm-clip and datasette-llm-embed plugins.
This was a little tricky because CLIP has much larger memory requirements than most of the applications that I deploy using Datasette. Here's how I got it to work.
I ended up needing two files. The first was a Dockerfile
, and the second was a fly.toml
file to configure the deployment.
Here's the Dockerfile
:
FROM python:3.11.0-slim-bullseye
COPY . /app
WORKDIR /app
RUN pip install -U torch torchvision --extra-index-url https://download.pytorch.org/whl/cpu
RUN pip install -U datasette datasette-scale-to-zero datasette-block-robots datasette-llm-embed llm-clip
# Download the CLIP model
RUN llm embed -m clip -c 'hello world'
# Create metadata.json file
RUN echo '{"plugins": {"datasette-scale-to-zero": {"duration": "10m"}}}' > /app/metadata.json
# Expose and start the service
ENV PORT 8080
EXPOSE 8080
CMD datasette serve --host 0.0.0.0 --cors --port $PORT -m /app/metadata.json
The trickiest detail here was installing torch
and torchvision
. It turns out the Fly builder application I was using had access to CUDA libraries, which caused it to install the ultra-heavy version of PyTorch - multiple GBs of data.
After much experimentation I found that this incantation causes PyTorch to be installed with CPU-only dependencies, which is a fraction of the size:
pip install -U torch torchvision \
--extra-index-url https://download.pytorch.org/whl/cpu
The rest of the Dockerfile
is pretty standard - it installs some Datasette plugins plus llm-clip
.
One trick I'm using is to run llm embed -m clip -c 'hello world'
as part of the build process. This causes the CLIP model to be downloaded from Hugging Face and cached as part of the Docker image build process, which means it's baked into the container and available immediately when the app starts up.
My fly.toml
file looks like this:
app = "clip-datasette-on-fly"
[[services]]
internal_port = 8080
protocol = "tcp"
[services.concurrency]
hard_limit = 25
soft_limit = 20
[[services.ports]]
handlers = ["http"]
port = 80
[[services.ports]]
handlers = ["tls", "http"]
port = 443
[[services.tcp_checks]]
interval = 10000
timeout = 2000
I ran the following commands in the same directory as the fly.toml
and Dockerfile
files to create the application and then deploy it:
flyctl apps create clip-datasette-on-fly
flyctl deploy
The app name needs to be unique across all of Fly, so you will need to change this if you want to deploy your own copy.
The default deployment didn't have enough memory to run CLIP without crashing. I ran these commands to fix that:
flyctl scale vm performance-1x
flyctl scale memory 8192
If this machine ran all the time it would cost me $61.02/month - but I've installed the datasette-scale-to-zero and configured it to stop the server after 10m of no activity, which should save a lot of money.
I also installed datasette-block-robots to prevent search engines from crawling the site and waking it up more than strictly necessary.
My Search for Faucets with CLIP, API edition Observable notebook makes calls to my newly deployed API to calculate CLIP embeddings for text, then passes the result to Drew's Faucet Finder to find matching faucets.
The relevant JavaScript looks like this, where text
is the text to be embedded:
hex_embedding = {
try {
return (
await (
await fetch(
`https://clip-datasette-on-fly.fly.dev/_memory.json?sql=select+hex(llm_embed(%27clip%27%2C+%3Aq))+as+x&q=${encodeURIComponent(
text
)}&_shape=array`
)
).json()
)[0].x;
} catch {
return "";
}
}
Here's the HTML version of that SQL query.