Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI o1 models require max_completion_tokens instead of max_tokens #724

Open
archer-eric opened this issue Jan 28, 2025 · 3 comments
Open

Comments

@archer-eric
Copy link

Problem

OpenAI o1 models require max_completion_tokens instead of max_tokens.

When the llm command is used with an OpenAI o1 series model like o1-preview, the -o max_tokens N option returns an error from OpenAI that max_completion_tokens should be used instead.

When -o max_completion_tokens N is used, llm generates an error instead of passing it to the OpenAI API.

Model Provider Documentation

The OpenAI docs explain that max_tokens is deprecated and is already not compatible with the o1 models:
https://platform.openai.com/docs/api-reference/chat/create

max_tokens Deprecated integer or null

Optional
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

This value is now deprecated in favor of max_completion_tokens, and is not compatible with o1 series models.

max_completion_tokens integer or null

Optional
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

Examples

These commands demonstrate the problem:

$ LLM_OPENAI_SHOW_RESPONSES=1 llm --no-stream -m o1-preview -o max_completion_tokens 32000 "The response to this is a creative novel"
Error: max_completion_tokens
  Extra inputs are not permitted
$ LLM_OPENAI_SHOW_RESPONSES=1 llm --no-stream -m o1-preview -o max_tokens 32000 "The response to this is a creative novel"
Request: POST https://api.openai.com/v1/chat/completions
  Headers:
    host: api.openai.com
    connection: keep-alive
    accept: application/json
    content-type: application/json
    user-agent: OpenAI/Python 1.60.1
    x-stainless-lang: python
    x-stainless-package-version: 1.60.1
    x-stainless-os: Linux
    x-stainless-arch: x64
    x-stainless-runtime: CPython
    x-stainless-runtime-version: 3.10.12
    authorization: [...]
    x-stainless-async: false
    x-stainless-retry-count: 0
    content-length: 138
  Body:
    {
      "messages": [
        {
          "role": "user",
          "content": "The response to this is a creative novel"
        }
      ],
      "model": "o1-preview",
      "max_tokens": 32000,
      "stream": false
    }
Response: status_code=400
  Headers:
    date: Tue, 28 Jan 2025 05:46:32 GMT
    content-type: application/json
    content-length: 245
    connection: keep-alive
    access-control-expose-headers: X-Request-ID
    openai-organization: [...]
    openai-processing-ms: 20
    openai-version: 2020-10-01
    x-ratelimit-limit-requests: 10000
    x-ratelimit-limit-tokens: 30000000
    x-ratelimit-remaining-requests: 9999
    x-ratelimit-remaining-tokens: 29995904
    x-ratelimit-reset-requests: 6ms
    x-ratelimit-reset-tokens: 8ms
    x-request-id: req_[...]
    strict-transport-security: max-age=31536000; includeSubDomains; preload
    cf-cache-status: DYNAMIC
    set-cookie: __cf_bm=...
    x-content-type-options: nosniff
    server: cloudflare
    cf-ray: [...]
    alt-svc: h3=":443"; ma=86400
  Body:
{
  "error": {
    "message": "Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.",
    "type": "invalid_request_error"
,
    "param": "max_tokens",
    "code": "unsupported_parameter"
  }
}
Error: Error code: 400 - {'error': {'message': "Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.", 'type': 'invalid_request_error', 'param': 'max_tokens', 'code': 'unsupported_parameter'}}
$ llm --version
llm, version 0.20

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.5 LTS
Release:	22.04
Codename:	jammy

$ uname -a
Linux [...] 6.8.0-52-generic #53~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 15 19:18:46 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

llm Documentation

This documentation can also be updated along with fixing the code, as it lists max_tokens instead of max_completion_tokens for the o1 models:
https://github.com/simonw/llm/blob/main/docs/contributing.md

OpenAI Chat: o1
  Options:
    temperature: float
    max_tokens: int
[...]
OpenAI Chat: o1-2024-12-17
  Options:
    temperature: float
    max_tokens: int
[...]
OpenAI Chat: o1-preview
  Options:
    temperature: float
    max_tokens: int
[...]
OpenAI Chat: o1-mini
  Options:
    temperature: float
    max_tokens: int
[...]
@archer-eric
Copy link
Author

extra credit: llm could also support -o max_tokens N as an alias for -o max_completion_tokens N when using an OpenAI model.

@archer-eric
Copy link
Author

Updated title to include o3 models, which seem to have the same requirements.

@simonw
Copy link
Owner

simonw commented Feb 2, 2025

Urgh. The way I see it there are three options here:

  1. Remove the max_tokens option in favor of max_completion_tokens. This is consistent with OpenAI's API but inconsistent with other models. Users of LLM will have to think about which option to use even though they do the same thing.
  2. Stick with -o max_tokens 100 as the LLM option, send max_completion_tokens to the API. This is better for trying out the same prompt against multiple models and saves users of LLM from having to think about OpenAI's non-standard naming, but is inconsistent with the OpenAI API.
  3. Support both. A bit ugly but does paper over both problems.

I think I like option 3, it makes OpenAI's weird issue visible in the LLM docs but feels the most convenient for users.

This is why designing abstraction layers across multiple models is hard!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants