Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: <NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.> #1610

Open
1 of 3 tasks
chericher opened this issue Jan 10, 2025 · 1 comment
Open
1 of 3 tasks
Labels
triage Default label assignment, indicates new issue needs reviewed by a maintainer

Comments

@chericher
Copy link

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the issue

"I have locally deployed bge-large-zh-v1.5 + qwen2.5-3B-Instruct + GraphRAG 1.1.2, using Python 3.10.12 and torch 2.5. When I run the graphrag index --root ./ command, I encounter the following error:"

15:05:20,104 graphrag.utils.storage INFO reading table from storage: create_final_relationships.parquet
15:05:20,108 graphrag.utils.storage INFO reading table from storage: create_final_entities.parquet
15:05:20,113 graphrag.utils.storage INFO reading table from storage: create_final_communities.parquet
15:05:20,130 graphrag.index.operations.summarize_communities.prepare_community_reports INFO Number of nodes at level=0 => 3
15:05:24,750 httpx INFO HTTP Request: POST http://localhost:8000/v1/chat/completions "HTTP/1.1 200 OK"
15:05:24,912 graphrag.utils.storage INFO reading table from storage: create_final_documents.parquet
15:05:24,917 graphrag.utils.storage INFO reading table from storage: create_final_relationships.parquet
15:05:24,922 graphrag.utils.storage INFO reading table from storage: create_final_text_units.parquet
15:05:24,927 graphrag.utils.storage INFO reading table from storage: create_final_entities.parquet
15:05:24,932 graphrag.utils.storage INFO reading table from storage: create_final_community_reports.parquet
15:05:24,942 graphrag.index.flows.generate_text_embeddings INFO Creating embeddings
15:05:24,942 graphrag.index.operations.embed_text.embed_text INFO using vector store lancedb with container_name default for embedding entity.description: default-entity-description
15:05:25,143 graphrag.index.operations.embed_text.strategies.openai INFO embedding 3 inputs via 3 snippets using 1 batches. max_batch_size=16, max_tokens=8191
15:05:25,391 httpx INFO HTTP Request: POST http://localhost:8150/v1/embeddings "HTTP/1.1 200 OK"
15:05:25,432 graphrag.index.operations.embed_text.embed_text INFO using vector store lancedb with container_name default for embedding text_unit.text: default-text_unit-text
15:05:25,436 graphrag.index.operations.embed_text.strategies.openai INFO embedding 1 inputs via 1 snippets using 1 batches. max_batch_size=16, max_tokens=8191
15:05:25,445 graphrag.index.operations.embed_text.embed_text INFO using vector store lancedb with container_name default for embedding community.full_content: default-community-full_content
15:05:25,448 graphrag.index.operations.embed_text.strategies.openai INFO embedding 1 inputs via 1 snippets using 1 batches. max_batch_size=16, max_tokens=8191
15:05:25,471 httpx INFO HTTP Request: POST http://localhost:8150/v1/embeddings "HTTP/1.1 400 Bad Request"
15:05:25,475 graphrag.callbacks.file_workflow_callbacks INFO Error Invoking LLM details={'prompt': ["# Family A\n\nThe community revolves around the key entities A, F, and M, who are related by familial ties. A is the child of F and M, and both F and M are parents of A. This family structure is central to the community's dynamics.\n\n## F and M as parents\n\nF and M are the parents of A, and their roles as parents are central to the community's structure. Their relationship with A is crucial in understanding the dynamics of the family. [Data: Entities (1, 2), Relationships (0, 1, +more)]\n\n## A as the child\n\nA is the child of F and M, and their relationship with A is central to the community's structure. A's role as a child is significant in understanding the family dynamics and potential conflicts. [Data: Entities (0), Relationships (0, 1, +more)]\n\n## F and M's combined degree\n\nF and M have a combined degree of 3, indicating their significant role in the community. Their relationship with A is crucial in understanding the family dynamics and potential conflicts. [Data: Entities (1, 2), Relationships (0, 1, +more)]\n\n## A's relationship with F and M\n\nA's relationship with F and M is central to the community's structure. Their roles as parents and the relationship with A are significant in understanding the family dynamics and potential conflicts. [Data: Entities (0), Relationships (0, 1, +more)]\n\n## Family structure\n\nThe family structure is central to the community's dynamics, with F and M as parents and A as the child. This structure is significant in understanding the potential for family disputes or conflicts. [Data: Entities (1, 2), Relationships (0, 1, +more)]"], 'kwargs': {}}
15:05:25,476 graphrag.index.run.run_workflows ERROR error running workflow generate_text_embeddings
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/run/run_workflows.py", line 166, in _run_workflows
result = await run_workflow(
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/workflows/generate_text_embeddings.py", line 45, in run_workflow
await generate_text_embeddings(
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/flows/generate_text_embeddings.py", line 98, in generate_text_embeddings
await _run_and_snapshot_embeddings(
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/flows/generate_text_embeddings.py", line 121, in _run_and_snapshot_embeddings
data["embedding"] = await embed_text(
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/operations/embed_text/embed_text.py", line 89, in embed_text
return await _text_embed_with_vector_store(
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/operations/embed_text/embed_text.py", line 179, in _text_embed_with_vector_store
result = await strategy_exec(
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/operations/embed_text/strategies/openai.py", line 63, in run
embeddings = await _execute(llm, text_batches, ticker, semaphore)
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/operations/embed_text/strategies/openai.py", line 103, in _execute
results = await asyncio.gather(*futures)
File "/usr/local/lib/python3.10/dist-packages/graphrag/index/operations/embed_text/strategies/openai.py", line 97, in embed
chunk_embeddings = await llm(chunk)
File "/usr/local/lib/python3.10/dist-packages/fnllm/base/base.py", line 112, in call
return await self._invoke(prompt, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/fnllm/base/base.py", line 128, in _invoke
return await self._decorated_target(prompt, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/fnllm/services/retryer.py", line 109, in invoke
result = await execute_with_retry()
File "/usr/local/lib/python3.10/dist-packages/fnllm/services/retryer.py", line 93, in execute_with_retry
async for a in AsyncRetrying(
File "/usr/local/lib/python3.10/dist-packages/tenacity/asyncio/init.py", line 166, in anext
do = await self.iter(retry_state=self._retry_state)
File "/usr/local/lib/python3.10/dist-packages/tenacity/asyncio/init.py", line 153, in iter
result = await action(retry_state)
File "/usr/local/lib/python3.10/dist-packages/tenacity/_utils.py", line 99, in inner
return call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/tenacity/init.py", line 398, in
self._add_action_func(lambda rs: rs.outcome.result())
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/local/lib/python3.10/dist-packages/fnllm/services/retryer.py", line 101, in execute_with_retry
return await attempt()
File "/usr/local/lib/python3.10/dist-packages/fnllm/services/retryer.py", line 78, in attempt
return await delegate(prompt, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/fnllm/services/rate_limiter.py", line 70, in invoke
result = await delegate(prompt, **args)
File "/usr/local/lib/python3.10/dist-packages/fnllm/base/base.py", line 152, in _decorator_target
output = await self._execute_llm(prompt, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/fnllm/openai/llm/embeddings.py", line 133, in _execute_llm
response = await self._call_embeddings_or_cache(
File "/usr/local/lib/python3.10/dist-packages/fnllm/openai/llm/embeddings.py", line 110, in _call_embeddings_or_cache
return await self._cache.get_or_insert(
File "/usr/local/lib/python3.10/dist-packages/fnllm/services/cache_interactor.py", line 50, in get_or_insert
entry = await func()
File "/usr/local/lib/python3.10/dist-packages/openai/resources/embeddings.py", line 236, in create
return await self._post(
File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1849, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1543, in request
return await self._request(
File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1644, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.\n\n(The expanded size of the tensor (513) must match the existing size (512) at non-singleton dimension 1. Target sizes: [1, 513]. Tensor sizes: [1, 512])', 'code': 50001}
15:05:25,477 graphrag.callbacks.file_workflow_callbacks INFO Error running pipeline! details=None
15:05:25,554 graphrag.cli.index ERROR Errors occurred during the pipeline run, see logs for more details.

Steps to reproduce

No response

GraphRAG Config Used

### This config file contains required core defaults that must be set, along with a handful of common optional settings.
### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/

### LLM settings ###
## There are a number of settings to tune the threading and token limits for LLM calls - check the docs.

encoding_model: cl100k_base # this needs to be matched to your model!

llm:
  api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file
  type: openai_chat # or azure_openai_chat
  model: qwen3B
  model_supports_json: false # recommended if this is available for your model.
  # audience: "https://cognitiveservices.azure.com/.default"
  api_base: http://localhost:8000/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>

parallelization:
  stagger: 0.3
  # num_threads: 50

async_mode: threaded # or asyncio

embeddings:
  async_mode: threaded # or asyncio
  vector_store: 
    type: lancedb
    db_uri: 'output/lancedb'
    container_name: default
    overwrite: true
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: gpt-4
    api_base: http://localhost:8150/v1
    # api_version: 2024-02-15-preview
    # audience: "https://cognitiveservices.azure.com/.default"
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>

### Input settings ###

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

chunks:
  size: 1200
  overlap: 100
  group_by_columns: [id]

### Storage settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided

cache:
  type: file # one of [blob, cosmosdb, file]
  base_dir: "cache"

reporting:
  type: file # or console, blob
  base_dir: "logs"

storage:
  type: file # one of [blob, cosmosdb, file]
  base_dir: "output"

## only turn this on if running `graphrag index` with custom settings
## we normally use `graphrag update` with the defaults
update_index_storage:
  # type: file # or blob
  # base_dir: "update_output"

### Workflow settings ###

skip_workflows: []

entity_extraction:
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1

summarize_descriptions:
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 1000

claim_extraction:
  enabled: false
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  prompt: "prompts/community_report.txt"
  max_length: 1000
  max_input_length: 4000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)

snapshots:
  graphml: false
  embeddings: false
  transient: false

### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query

local_search:
  prompt: "prompts/local_search_system_prompt.txt"

global_search:
  map_prompt: "prompts/global_search_map_system_prompt.txt"
  reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
  knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"

drift_search:
  prompt: "prompts/drift_search_system_prompt.txt"

basic_search:
  prompt: "prompts/basic_search_system_prompt.txt"

Logs and screenshots

(graphragtest) root@cdd2b6557714:/home/graphragtest# graphrag index --root ./

Logging enabled at /home/graphragtest/logs/indexing-engine.log
Running standard indexing.
🚀 create_base_text_units
id text document_ids n_tokens
0 b53ef702af00f35578b1cdbf74474a32866bd5bb89a30a... A的爸爸叫F。\n\nA的妈妈叫M。\n [10ae1eaa0dc9f3bd3cbbfc0ff5d391e0a4eb7ed2d604d... 22
🚀 create_final_documents
id human_readable_id title text text_unit_ids
0 10ae1eaa0dc9f3bd3cbbfc0ff5d391e0a4eb7ed2d604dd... 1 report.txt A的爸爸叫F。\n\nA的妈妈叫M。\n [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30...
🚀 extract_graph
None
🚀 compute_communities
level community parent title
0 0 0 -1 A
0 0 0 -1 F
0 0 0 -1 M
🚀 create_final_entities
id human_readable_id title type description text_unit_ids
0 c137ae10-4252-48da-894b-ca30f7aef684 0 A PERSON A is a person [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30...
1 5650c001-6bb1-4868-bbf9-08a8a3f95892 1 F PERSON F is the father of A [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30...
2 b447f3a1-29d2-4130-b586-da16499a79a2 2 M PERSON M is the mother of A [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30...
🚀 create_final_relationships
id human_readable_id source target description weight combined_degree text_unit_ids
0 69ebb419-9b02-4ca0-8d42-12335857355f 0 A F A's father is F 2.0 3 [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30...
1 4860e8a2-b30f-4615-b127-64ddc3617535 1 A M A's mother is M 2.0 3 [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30...
🚀 create_final_nodes
id human_readable_id title community level degree x y
0 c137ae10-4252-48da-894b-ca30f7aef684 0 A 0 0 2 0 0
1 5650c001-6bb1-4868-bbf9-08a8a3f95892 1 F 0 0 1 0 0
2 b447f3a1-29d2-4130-b586-da16499a79a2 2 M 0 0 1 0 0
🚀 create_final_communities
id human_readable_id community ... text_unit_ids period size
0 a48137e0-b5f5-4297-9919-50fb59ef270f 0 0 ... [b53ef702af00f35578b1cdbf74474a32866bd5bb89a30... 2025-01-10 3

[1 rows x 11 columns]
🚀 create_final_text_units
id ... relationship_ids
0 b53ef702af00f35578b1cdbf74474a32866bd5bb89a30a... ... [69ebb419-9b02-4ca0-8d42-12335857355f, 4860e8a...

[1 rows x 7 columns]
🚀 create_final_community_reports
id human_readable_id community ... full_content_json period size
0 54b5f0c3db3343f7a348d43a0ef6f086 0 0 ... {\n "title": "Family A",\n "summary": "T... 2025-01-10 3

[1 rows x 14 columns]
❌ generate_text_embeddings
None
⠼ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_final_documents ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── extract_graph ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── compute_communities ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_final_entities ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_final_relationships ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_final_nodes ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_final_communities ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_final_text_units ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_final_community_reports ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
❌ Errors occurred during the pipeline run, see logs for more details.

Additional Information

  • GraphRAG Version:1.1.2
  • Operating System:linux
  • Python Version:3.10.12
  • Related Issues:
@chericher chericher added the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Jan 10, 2025
@chericher
Copy link
Author

Is not the true reason why run error ?
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.\n\n(The expanded size of the tensor (513) must match the existing size (512) at non-singleton dimension 1. Target sizes: [1, 513]. Tensor sizes: [1, 512])', 'code': 50001}

it seems that tensor is too long. but my doc length is below 1000 tokens. i don`t know how to fix it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Default label assignment, indicates new issue needs reviewed by a maintainer
Projects
None yet
Development

No branches or pull requests

1 participant