Skip to content

Commit

Permalink
Add openAI compatible API (sgl-project#1810)
Browse files Browse the repository at this point in the history
Co-authored-by: Chayenne <[email protected]>
  • Loading branch information
2 people authored and zolinthecow committed Oct 29, 2024
1 parent 5a75683 commit fff0abd
Show file tree
Hide file tree
Showing 7 changed files with 800 additions and 56 deletions.
35 changes: 10 additions & 25 deletions .github/workflows/deploy-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,17 @@ on:
workflow_dispatch:

jobs:
execute-notebooks:
execute-and-deploy:
runs-on: 1-gpu-runner
if: github.repository == 'sgl-project/sglang'
defaults:
run:
working-directory: docs
steps:
- name: Checkout code
uses: actions/checkout@v3
with:
path: .

- name: Set up Python
uses: actions/setup-python@v4
Expand All @@ -25,15 +30,16 @@ jobs:
- name: Install dependencies
run: |
bash scripts/ci_install_dependency.sh
pip install -r docs/requirements.txt
pip install -r requirements.txt
apt-get update
apt-get install -y pandoc
- name: Setup Jupyter Kernel
run: |
python -m ipykernel install --user --name python3 --display-name "Python 3"
- name: Execute notebooks
run: |
cd docs
for nb in *.ipynb; do
if [ -f "$nb" ]; then
echo "Executing $nb"
Expand All @@ -43,36 +49,15 @@ jobs:
fi
done
build-and-deploy:
needs: execute-notebooks
if: github.repository == 'sgl-project/sglang'
runs-on: 1-gpu-runner
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'

- name: Install dependencies
run: |
bash scripts/ci_install_dependency.sh
pip install -r docs/requirements.txt
apt-get update
apt-get install -y pandoc
- name: Build documentation
run: |
cd docs
make html
- name: Push to sgl-project.github.io
env:
GITHUB_TOKEN: ${{ secrets.PAT_TOKEN }}
run: |
cd docs/_build/html
cd _build/html
git clone https://[email protected]/sgl-project/sgl-project.github.io.git ../sgl-project.github.io
cp -r * ../sgl-project.github.io
cd ../sgl-project.github.io
Expand Down
18 changes: 15 additions & 3 deletions .github/workflows/execute-notebook.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,24 @@
name: Execute Notebooks

on:
pull_request:
push:
branches:
- main
branches: [ main ]
paths:
- "python/sglang/**"
- "docs/**"
pull_request:
branches: [ main ]
paths:
- "python/sglang/**"
- "docs/**"
workflow_dispatch:


concurrency:
group: execute-notebook-${{ github.ref }}
cancel-in-progress: true


jobs:
run-all-notebooks:
runs-on: 1-gpu-runner
Expand Down
3 changes: 3 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ repos:
rev: 24.10.0
hooks:
- id: black
additional_dependencies: ['.[jupyter]']
types: [python, jupyter]
types_or: [python, jupyter]

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
Expand Down
104 changes: 83 additions & 21 deletions docs/embedding_model.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,32 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Embedding Model"
"# Embedding Model\n",
"\n",
"SGLang supports embedding models in the same way as completion models. Here are some example models:\n",
"\n",
"- [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct)\n",
"- [Alibaba-NLP/gte-Qwen2-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Launch A Server"
"## Launch A Server\n",
"\n",
"The following code is equivalent to running this in the shell:\n",
"```bash\n",
"python -m sglang.launch_server --model-path Alibaba-NLP/gte-Qwen2-7B-instruct \\\n",
" --port 30010 --host 0.0.0.0 --is-embedding --log-level error\n",
"```\n",
"\n",
"Remember to add `--is-embedding` to the command."
]
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 7,
"metadata": {},
"outputs": [
{
Expand All @@ -28,14 +41,14 @@
}
],
"source": [
"# Equivalent to running this in the shell:\n",
"# python -m sglang.launch_server --model-path Alibaba-NLP/gte-Qwen2-7B-instruct --port 30010 --host 0.0.0.0 --is-embedding --log-level error\n",
"from sglang.utils import execute_shell_command, wait_for_server, terminate_process\n",
"\n",
"embedding_process = execute_shell_command(\"\"\"\n",
"embedding_process = execute_shell_command(\n",
" \"\"\"\n",
"python -m sglang.launch_server --model-path Alibaba-NLP/gte-Qwen2-7B-instruct \\\n",
" --port 30010 --host 0.0.0.0 --is-embedding --log-level error\n",
"\"\"\")\n",
"\"\"\"\n",
")\n",
"\n",
"wait_for_server(\"http://localhost:30010\")\n",
"\n",
Expand All @@ -51,25 +64,32 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.0083160400390625, 0.0006804466247558594, -0.00809478759765625, -0.0006995201110839844, 0.0143890380859375, -0.0090179443359375, 0.01238250732421875, 0.00209808349609375, 0.0062103271484375, -0.003047943115234375]\n"
"Text embedding (first 10): [0.0083160400390625, 0.0006804466247558594, -0.00809478759765625, -0.0006995201110839844, 0.0143890380859375, -0.0090179443359375, 0.01238250732421875, 0.00209808349609375, 0.0062103271484375, -0.003047943115234375]\n"
]
}
],
"source": [
"# Get the first 10 elements of the embedding\n",
"import subprocess, json\n",
"\n",
"text = \"Once upon a time\"\n",
"\n",
"! curl -s http://localhost:30010/v1/embeddings \\\n",
"curl_text = f\"\"\"curl -s http://localhost:30010/v1/embeddings \\\n",
" -H \"Content-Type: application/json\" \\\n",
" -H \"Authorization: Bearer None\" \\\n",
" -d '{\"model\": \"Alibaba-NLP/gte-Qwen2-7B-instruct\", \"input\": \"Once upon a time\"}' \\\n",
" | python3 -c \"import sys, json; print(json.load(sys.stdin)['data'][0]['embedding'][:10])\""
" -d '{{\"model\": \"Alibaba-NLP/gte-Qwen2-7B-instruct\", \"input\": \"{text}\"}}'\"\"\"\n",
"\n",
"text_embedding = json.loads(subprocess.check_output(curl_text, shell=True))[\"data\"][0][\n",
" \"embedding\"\n",
"]\n",
"\n",
"print(f\"Text embedding (first 10): {text_embedding[:10]}\")"
]
},
{
Expand All @@ -81,37 +101,79 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.00603485107421875, -0.0190582275390625, -0.01273345947265625, 0.01552581787109375, 0.0066680908203125, -0.0135955810546875, 0.01131439208984375, 0.0013713836669921875, -0.0089874267578125, 0.021759033203125]\n"
"Text embedding (first 10): [0.00829315185546875, 0.0007004737854003906, -0.00809478759765625, -0.0006799697875976562, 0.01438140869140625, -0.00897979736328125, 0.0123748779296875, 0.0020923614501953125, 0.006195068359375, -0.0030498504638671875]\n"
]
}
],
"source": [
"import openai\n",
"\n",
"client = openai.Client(\n",
" base_url=\"http://127.0.0.1:30010/v1\", api_key=\"None\"\n",
")\n",
"client = openai.Client(base_url=\"http://127.0.0.1:30010/v1\", api_key=\"None\")\n",
"\n",
"# Text embedding example\n",
"response = client.embeddings.create(\n",
" model=\"Alibaba-NLP/gte-Qwen2-7B-instruct\",\n",
" input=\"How are you today\",\n",
" input=text,\n",
")\n",
"\n",
"embedding = response.data[0].embedding[:10]\n",
"print(embedding)"
"print(f\"Text embedding (first 10): {embedding}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using Input IDs\n",
"\n",
"SGLang also supports `input_ids` as input to get the embedding."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Input IDs embedding (first 10): [0.00829315185546875, 0.0007004737854003906, -0.00809478759765625, -0.0006799697875976562, 0.01438140869140625, -0.00897979736328125, 0.0123748779296875, 0.0020923614501953125, 0.006195068359375, -0.0030498504638671875]\n"
]
}
],
"source": [
"import json\n",
"import os\n",
"from transformers import AutoTokenizer\n",
"\n",
"os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(\"Alibaba-NLP/gte-Qwen2-7B-instruct\")\n",
"input_ids = tokenizer.encode(text)\n",
"\n",
"curl_ids = f\"\"\"curl -s http://localhost:30010/v1/embeddings \\\n",
" -H \"Content-Type: application/json\" \\\n",
" -H \"Authorization: Bearer None\" \\\n",
" -d '{{\"model\": \"Alibaba-NLP/gte-Qwen2-7B-instruct\", \"input\": {json.dumps(input_ids)}}}'\"\"\"\n",
"\n",
"input_ids_embedding = json.loads(subprocess.check_output(curl_ids, shell=True))[\"data\"][\n",
" 0\n",
"][\"embedding\"]\n",
"\n",
"print(f\"Input IDs embedding (first 10): {input_ids_embedding[:10]}\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
Expand Down
3 changes: 3 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,14 @@ The core features include:
:caption: Getting Started

install.md
send_request.ipynb


.. toctree::
:maxdepth: 1
:caption: Backend Tutorial

openai_api.ipynb
backend.md


Expand All @@ -43,3 +45,4 @@ The core features include:
choices_methods.md
benchmark_and_profiling.md
troubleshooting.md
embedding_model.ipynb
Loading

0 comments on commit fff0abd

Please sign in to comment.