You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenAI o1 models require max_completion_tokens instead of max_tokens.
When the llm command is used with an OpenAI o1 series model like o1-preview, the -o max_tokens N option returns an error from OpenAI that max_completion_tokens should be used instead.
When -o max_completion_tokens N is used, llm generates an error instead of passing it to the OpenAI API.
Optional
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
This value is now deprecated in favor of max_completion_tokens, and is not compatible with o1 series models.
max_completion_tokens integer or null
Optional
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
Examples
These commands demonstrate the problem:
$ LLM_OPENAI_SHOW_RESPONSES=1 llm --no-stream -m o1-preview -o max_completion_tokens 32000 "The response to this is a creative novel"
Error: max_completion_tokens
Extra inputs are not permitted
$ LLM_OPENAI_SHOW_RESPONSES=1 llm --no-stream -m o1-preview -o max_tokens 32000 "The response to this is a creative novel"
Request: POST https://api.openai.com/v1/chat/completions
Headers:
host: api.openai.com
connection: keep-alive
accept: application/json
content-type: application/json
user-agent: OpenAI/Python 1.60.1
x-stainless-lang: python
x-stainless-package-version: 1.60.1
x-stainless-os: Linux
x-stainless-arch: x64
x-stainless-runtime: CPython
x-stainless-runtime-version: 3.10.12
authorization: [...]
x-stainless-async: false
x-stainless-retry-count: 0
content-length: 138
Body:
{
"messages": [
{
"role": "user",
"content": "The response to this is a creative novel"
}
],
"model": "o1-preview",
"max_tokens": 32000,
"stream": false
}
Response: status_code=400
Headers:
date: Tue, 28 Jan 2025 05:46:32 GMT
content-type: application/json
content-length: 245
connection: keep-alive
access-control-expose-headers: X-Request-ID
openai-organization: [...]
openai-processing-ms: 20
openai-version: 2020-10-01
x-ratelimit-limit-requests: 10000
x-ratelimit-limit-tokens: 30000000
x-ratelimit-remaining-requests: 9999
x-ratelimit-remaining-tokens: 29995904
x-ratelimit-reset-requests: 6ms
x-ratelimit-reset-tokens: 8ms
x-request-id: req_[...]
strict-transport-security: max-age=31536000; includeSubDomains; preload
cf-cache-status: DYNAMIC
set-cookie: __cf_bm=...
x-content-type-options: nosniff
server: cloudflare
cf-ray: [...]
alt-svc: h3=":443"; ma=86400
Body:
{
"error": {
"message": "Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.",
"type": "invalid_request_error"
,
"param": "max_tokens",
"code": "unsupported_parameter"
}
}
Error: Error code: 400 - {'error': {'message': "Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.", 'type': 'invalid_request_error', 'param': 'max_tokens', 'code': 'unsupported_parameter'}}
$ llm --version
llm, version 0.20
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.5 LTS
Release: 22.04
Codename: jammy
$ uname -a
Linux [...] 6.8.0-52-generic #53~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 15 19:18:46 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Urgh. The way I see it there are three options here:
Remove the max_tokens option in favor of max_completion_tokens. This is consistent with OpenAI's API but inconsistent with other models. Users of LLM will have to think about which option to use even though they do the same thing.
Stick with -o max_tokens 100 as the LLM option, send max_completion_tokens to the API. This is better for trying out the same prompt against multiple models and saves users of LLM from having to think about OpenAI's non-standard naming, but is inconsistent with the OpenAI API.
Support both. A bit ugly but does paper over both problems.
I think I like option 3, it makes OpenAI's weird issue visible in the LLM docs but feels the most convenient for users.
This is why designing abstraction layers across multiple models is hard!
Problem
OpenAI o1 models require
max_completion_tokens
instead ofmax_tokens
.When the
llm
command is used with an OpenAIo1
series model likeo1-preview
, the-o max_tokens N
option returns an error from OpenAI thatmax_completion_tokens
should be used instead.When
-o max_completion_tokens N
is used,llm
generates an error instead of passing it to the OpenAI API.Model Provider Documentation
The OpenAI docs explain that
max_tokens
is deprecated and is already not compatible with the o1 models:https://platform.openai.com/docs/api-reference/chat/create
Examples
These commands demonstrate the problem:
$ LLM_OPENAI_SHOW_RESPONSES=1 llm --no-stream -m o1-preview -o max_completion_tokens 32000 "The response to this is a creative novel" Error: max_completion_tokens Extra inputs are not permitted
$ llm --version llm, version 0.20 $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.5 LTS Release: 22.04 Codename: jammy $ uname -a Linux [...] 6.8.0-52-generic #53~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 15 19:18:46 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
llm
DocumentationThis documentation can also be updated along with fixing the code, as it lists
max_tokens
instead ofmax_completion_tokens
for theo1
models:https://github.com/simonw/llm/blob/main/docs/contributing.md
The text was updated successfully, but these errors were encountered: