Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token limit #12

Open
szym1998 opened this issue Nov 10, 2023 · 2 comments
Open

Token limit #12

szym1998 opened this issue Nov 10, 2023 · 2 comments

Comments

@szym1998
Copy link

Issue Description

Problem: When using the new OpenAI library in my asynchronous application, I've encountered an issue related to rate limiting.

Description: It appears that when I run my asynchronous application, even just once, the rate limiter starts to restrict requests, preventing it from going through. This issue arises when I set the token limit to 90,000. However, when I increase the token limit to 900,000, the request go through without. It's important to note that my system message, user input, and response typically comprise only around 2,700 tokens in total.

Steps to Reproduce:

Install the OpenAI library 1.1 i think (the latest one)
Set the token limit to 90,000.
Run your asynchronous application.
Observe the rate limiter restricting requests.
Expected Behavior:

Requests should not be rate-limited when the token limit is set to 90,000, given that the total token count is well below this limit.
Actual Behavior:

The rate limiter appears to limit requests, even when the token limit is set to 90,000.

@szym1998
Copy link
Author

async def create_poem():
chat_params = {
"model": "gpt-3.5-turbo-1106",
"temperature": 0.01,
"response_format": {"type": "json_object"},
"max_tokens": 2048,
"messages": [
{"role": "system", "content": combined_system_message},
{"role": "user", "content": data_content}
]
}
try:
async with rate_limiter.limit(**chat_params):
completion = await client.chat.completions.create(**chat_params)

        # Extract the content from the first choice of the completion
        content = completion.choices[0].message.content
        #load content as json
        content = json.loads(content)
    return content
except Exception as e:
    print(f"An error occurred: {e}")
    return None

@klintan
Copy link

klintan commented Nov 17, 2023

I think the behavior you are seeing is because the single request token max limit is strict 1 / 60 of your token limit (90000 / 60 = 1500 in your case). Which is why the capacity is never enough to fulfill the request and it hangs forever.
This PR would solve the issue #10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants