This directory contains Docker configuration for running an OpenAI-compatible Deep Researcher API endpoint. The setup supports multiple operation modes through configuration and GPU acceleration options. But for most user, I still suggest using CPU version for smaller file size as GPU is only used for accelerating PDF OCR
- Docker and Docker Compose installed
- For local web parsing mode: Chrome/Chromium with remote debugging
- For local models mode: Ollama installed
- For GPU acceleration:
- NVIDIA users: NVIDIA drivers and nvidia-container-runtime installed
- AMD users: ROCm installed (version 6.3.2 or later)
Configure your preferred mode in research.config
:
[Settings]
use_jina = true # Use Jina for fast web parsing
use_ollama = false # Use OpenRouter models
with_planning = true # Enable planning agent
default_model = anthropic/claude-3.5-haiku
reason_model = deepseek/deepseek-r1-distill-qwen-32b
[Settings]
use_jina = true # Use Jina for web parsing
use_ollama = true # Use local models
with_planning = true
default_model = mistral-small
reason_model = deepseek-r1:14b
[Settings]
use_jina = false # Use local web parsing
use_ollama = true # Use local models
with_planning = true
default_model = mistral-small
reason_model = deepseek-r1:14b
[LocalAI]
default_model_ctx = -1 # Use -1 to load default context length settings from modelfile
reason_model_ctx = -1
ollama_base_url
: Ollama API endpoint (default: http://localhost:11434)default_model_ctx
: Set context length for search and writing models, -1 for defaultreason_model_ctx
: Set context length for reasoning and planning models, -1 for default
openai_compat_api_key
: Authentication key for the API endpointjina_api_key
: Required if use_jina = trueopenai_url
: OpenRouter or other OpenAI-compatible endpointjina_base_url
: Jina parsing service URLsearxng_url
: Local SearXNG instance URL
use_jina
: Enable Jina API for fast web parsinguse_ollama
: Use local Ollama models instead of OpenRouterwith_planning
: Enable research planning agentdefault_model
: Model for search and writingreason_model
: Model for planning and reasoning
concurrent_limit
: Maximum concurrent operations (default: 3)cool_down
: Delay between requests to same domain (default: 10.0)chrome_port
: Chrome debugging port (default: 9222)chrome_host_ip
: Chrome host IP address (default: http://localhost, if you are not running with--remote-debugging-address=0.0.0.0
, you need to change this to your local IP)use_embed_browser
: Use embedded Playwright browser instead of external Chrome (default: false)
request_per_minute
: Rate limit for default model (-1 to disable)operation_wait_time
: Wait time in seconds between research iterations (0 to disable)fallback_model
: Model to use when rate limited (e.g., google/gemini-2.0-flash-001). Should have:- Large context length (100k+ recommended for pure online method, 32k+ for local)
- High tokens per minute limit
temp_pdf_dir
: Directory for temporary PDF storagebrowse_lite
: Fast parsing mode without ML models (0/1)pdf_max_pages
: Maximum PDF pages to process (default: 30)pdf_max_filesize
: Maximum PDF file size in bytestimeout_pdf
: PDF processing timeout in secondsmax_html_length
: Maximum HTML content length to processmax_eval_time
: JavaScript evaluation timeoutverbose_web_parse_detail
: Enable detailed parsing logs
-
Configure Operation Mode:
- Edit
research.config
based on your preferred mode - Set API keys and URLs as needed
- Edit
-
For Local Models (if use_ollama = true):
ollama pull mistral-small # search & writing ollama pull deepseek-r1:14b # reasoning & planning
-
For Local Web Parsing (if use_jina = false):
# Option 1: External Chrome (use_embed_browser = false) google-chrome --remote-debugging-port=9222 --remote-debugging-address=0.0.0.0 # Optional: Start Chrome with your credentials for academic access google-chrome --remote-debugging-port=9222 --remote-debugging-address=0.0.0.0 --user-data-dir=/path/to/profile # Option 2: Embedded Browser (use_embed_browser = true) # No need to start Chrome manually, set use_embed_browser = true in research.config # The container will automatically manage a headless browser # Optional: Enhanced parsing capabilities ollama pull reader-lm:0.5b
-
Start Services:
For CPU-only operation:
docker compose up --build
For NVIDIA GPU acceleration:
docker compose -f docker-compose.cuda.yml up --build
- Uses CUDA 12.8.0 with cuDNN
- Requires NVIDIA Container Runtime
- Automatically enables all available NVIDIA GPUs
For AMD GPU acceleration:
docker compose -f docker-compose.rocm.yml up --build
- Uses ROCm 6.3.2
- PyTorch 2.4.0 with ROCm support
- Requires ROCm installation and compatible AMD GPU
The service provides an OpenAI-compatible endpoint at http://localhost:8000/v1
import requests
import json
# Setup API configuration
base_url = "http://localhost:8000/v1"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer sk-xxx" # Default API key works for local endpoint
}
# Prepare request data
data = {
"model": "deep_researcher",
"messages": [
{"role": "system", "content": "Write in a formal tone."}, # Only writing instructions are supported now
{"role": "user", "content": "Latest developments in quantum computing"}
],
"stream": True, # Enable live updates
"max_iterations": 10, # Research depth (>1)
"max_search_items": 4, # Results per search (>1, for use_jina=false)
"default_model": "anthropic/claude-3.5-haiku", # Optional: Override default model
"reason_model": "deepseek/deepseek-r1-distill-qwen-32b" # Optional: Override reasoning model
}
# Make the API request
response = requests.post(
f"{base_url}/chat/completions",
headers=headers,
json=data,
stream=True
)
# Stream the response
for line in response.iter_lines():
if not line:
continue
if line.startswith(b"data: "):
try:
chunk = json.loads(line[6:]) # Skip "data: " prefix
if chunk.get("choices") and chunk["choices"][0].get("delta", {}).get("content"):
print(chunk["choices"][0]["delta"]["content"], end="")
except json.JSONDecodeError:
continue
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deep_researcher",
"messages": [{"role": "user", "content": "Latest developments in quantum computing"}],
"stream": true,
"max_iterations": 10, # Research depth (>1)
"max_search_items": 4, # Results per search (>1, for use_jina=false)
"default_model": "anthropic/claude-3.5-haiku", # Optional: Override default model
"reason_model": "deepseek/deepseek-r1-distill-qwen-32b" # Optional: Override reasoning model
}'
temp_pdf
: Temporary PDF storagesearxng-data
: Persistent SearXNG configuration
-
Chrome/Browser Issues:
- For external Chrome (use_embed_browser = false):
- Verify Chrome is running with remote debugging
- Check port 9222 accessibility
- Ensure no firewall blocks the connection
- For embedded browser (use_embed_browser = true):
- No manual Chrome setup needed
- Check container logs for Playwright browser installation status
- For external Chrome (use_embed_browser = false):
-
SearXNG Issues:
- Verify port 4000 availability
- Check container logs for startup problems
-
API Response Issues:
- Verify API keys in research.config
- Check model availability if using Ollama
- Review operation mode settings
-
GPU Issues:
-
NVIDIA:
- Verify nvidia-smi shows your GPU
- Check nvidia-container-runtime installation
- Ensure NVIDIA drivers are up to date
-
AMD:
- Verify ROCm installation with rocm-smi
- Check GPU compatibility with ROCm
- Ensure proper device permissions (/dev/kfd, /dev/dri)
-