-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Rate Limiting Controls Not Functioning as Expected Resulting in 429s #1620
Comments
It appears there is also some weirdness in how rate limiting is being handled across workflows... for instance in the community summarization extractor we have this line: async def _run_extractor(
llm: ChatLLM,
community: str | int,
input: str,
level: int,
args: StrategyConfig,
callbacks: WorkflowCallbacks,
) -> CommunityReport | None:
# RateLimiter
rate_limiter = RateLimiter(rate=1, per=60) Where we can see the rate limiter being instantiated . However if we look at the _run_extractor code for other workflows we find that it is absent. See here for instance: Then again I may be completely off base. |
Hi @darien-schettler, I just pushed another PR that is resolving this for me. It appears that prior to using fnllm we had fallbacks so that if the tpm/rpm were set to 0 they would end up with defaults of 50_000 and 10_000 respectively. fnllm does not contain the same fallbacks, so we were inadvertently retaining the settings of 0, which does not rate limit (I find that 0 = "no rate limiting" confusing anyway too, but it's there for legacy reasons and will be fixed soon). Anyhow, by adding the defaults in new configs, and ensuring I have some reasonable settings, I've been able to run pretty big (multi-hour) jobs without any problems. However, I do see that you are setting those values in your llm block (though not the embeddings llm block, which you should do as well). You might try setting your values a bit lower as well - I am setting mine to about half or less of my endpoint limits. I suspect we still have a couple of issues with how the rate limiting and parallelization config is mapped, because we used to have a very complex config inheritance model that made it difficult to debug. We have another PR that drastically simplifies config coming soon. TL;DR: please try lower settings, ensure you set them for embeddings too, and we know we're not out of the woods yet on the fnllm issues. Thanks! |
Ah! That makes total sense. I set it superrr low and it ran through extraction and failed after. I assumed this was due to the chat rate limit but (duh) it must have been the embedding afterwards. I’ll update the values in that section and take your advice and lower the limits a bit further too. Thanks for the response and the wonderful work you do! Closing this now. |
Hey @natoverse - I'm still having this issue after updating to 1.2.0. I've seen it throw 429s on both completions and embeddings. Updated yaml below: ### This config file contains required core defaults that must be set, along with a handful of common optional settings.
### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/
### LLM settings ###
## There are a number of settings to tune the threading and token limits for LLM calls - check the docs.
encoding_model: cl100k_base # this needs to be matched to your model!
llm:
api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file
type: openai_chat # or azure_openai_chat
model: gpt-4o-mini
model_supports_json: true # recommended if this is available for your model.
requests_per_minute: 250
tokens_per_minute: 1_000_000
concurrent_requests: 5
max_retries: 3
max_retry_wait: 60
sleep_on_rate_limit_recommendation: true
# audience: "https://cognitiveservices.azure.com/.default"
# api_base: https://<instance>.openai.azure.com
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
parallelization:
stagger: 0.3
# num_threads: 50
async_mode: threaded # or asyncio
embeddings:
async_mode: threaded # or asyncio
vector_store:
type: lancedb
db_uri: 'output/lancedb'
container_name: default
overwrite: true
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding # or azure_openai_embedding
model: text-embedding-3-small
requests_per_minute: 100
tokens_per_minute: 200_000
max_retries: 3
max_retry_wait: 60
sleep_on_rate_limit_recommendation: true
concurrent_requests: 1
batch_size: 1
# api_base: http://localhost:1234/v1
# api_version: 2024-02-15-preview
# audience: "https://cognitiveservices.azure.com/.default"
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
### Input settings ###
input:
type: file # or blob
file_type: text # or csv
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\\.txt$"
chunks:
size: 1200
overlap: 100
group_by_columns: [id]
### Storage settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided
cache:
type: file # one of [blob, cosmosdb, file]
base_dir: "cache"
reporting:
type: file # or console, blob
base_dir: "logs"
storage:
type: file # one of [blob, cosmosdb, file]
base_dir: "output"
## only turn this on if running `graphrag index` with custom settings
## we normally use `graphrag update` with the defaults
update_index_storage:
# type: file # or blob
# base_dir: "update_output"
### Workflow settings ###
skip_workflows: []
entity_extraction:
prompt: "prompts/entity_extraction.txt"
entity_types: [organization,person,geo,event,concept,algorithm,metric,dataset,tool,definition]
max_gleanings: 1
summarize_descriptions:
prompt: "prompts/summarize_descriptions.txt"
max_length: 500
claim_extraction:
enabled: false
prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 1
community_reports:
prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000
cluster_graph:
max_cluster_size: 16
embed_graph:
enabled: true # if true, will generate node2vec embeddings for nodes
umap:
enabled: true # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)
snapshots:
graphml: true
embeddings: true
transient: true
### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query
local_search:
prompt: "prompts/local_search_system_prompt.txt"
global_search:
map_prompt: "prompts/global_search_map_system_prompt.txt"
reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"
drift_search:
prompt: "prompts/drift_search_system_prompt.txt"
basic_search:
prompt: "prompts/basic_search_system_prompt.txt" |
Do you need to file an issue?
Describe the bug
Basically, ever since the updates to versions greater than
1.0.0
I have been facing impassable issues with rate-limiting. I have tried using the config to set explicit rate limits but unfortunately the issue persists. I tried waiting a full day and the issue still persists. I would guess this is related to fnllm and it appears that @natoverse attempted to push a fix in1.0.1
(#1530). However, the issue still remains (at least for me).Steps to reproduce
Run any signficiantly large dataset through the indexing pipeline by running:
graphrag index --root .
The majority of graph extraction proceeds successfully but near the end it errors out due to rate limits.
Expected Behavior
We should be able to control the requests and, in general (by default) we shouldn't be overwhelming the OpenAI API with requests to the degree where it will cause issues. I would take slower and successful over faster with rate limits any day.
GraphRAG Config Used
Logs and screenshots
The following is an example of the last two entries into the logs.json.
and here is where the errors start in the indexing-engine.log file:
Additional Information
The text was updated successfully, but these errors were encountered: