Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GraphRAG - vector search #57

Open
jexp opened this issue Oct 5, 2024 · 4 comments
Open

GraphRAG - vector search #57

jexp opened this issue Oct 5, 2024 · 4 comments
Assignees

Comments

@jexp
Copy link

jexp commented Oct 5, 2024

Thanks for adding GraphRAG to RAGbuilder.

I had some questions and suggestions, perhaps you want to chat some time.

  • QQ: in graphrag.full_retriever you fetch the vector store data but don't use it in the method or the returns, looks redundant?
        def full_retriever(question: str):
            graph_data = graph_retriever(question)
            vector_data = [el.page_content for el in vector_retriever.invoke(question)]
            final_data = f'''Graph data:
        {graph_data}
            '''
            return final_data
  • You don't make use of the built in neo4j vector search only the fulltext index - with the vector search you can allow in-graph vector and hybrid search? (you can create vector indexes both for chunks in the lexical graph, for entities in the domain graph and for communities in the topical structures)
  • right now the graph retriever only uses the direct neighbourhood of the nodes, this could be a good hyperparameter to add
  • e.g. we have a number of different retrievers in the llm-graph-builder, see: https://github.com/neo4j-labs/llm-graph-builder/blob/DEV/backend/src/shared/constants.py
  • I saw you copied some code from the neo4j-langchain integrations? Was there a reason (i.e. did you make modifications - if so it might be good to discuss to rather contribute them back upstream?)
  • there is the option to run clustering algorithms to generate cross-document topic summaries across the entity graphs (like in the MSFT GraphRAG paper), see https://neo4j.com/developer-blog/global-graphrag-neo4j-langchain/ (we've also implemented that in https://llm-graph-builder.neo4jlabs.com if you have a graph data science enabled database).

We have documented more GraphRAG patterns, here just in case you want to share your RAG patterns to the catalogue or provide some feedback:

@aravind10x
Copy link
Contributor

Hi @jexp, thanks for your questions & thoughts!

@ashwinzyx - perhaps, you can take a look once you're back.

@ashwinzyx
Copy link
Contributor

ashwinzyx commented Oct 16, 2024

Hi @jexp, thanks for looking at our repo. Apologies for the delay. Just got back from vacation.

  • QQ: in graphrag.full_retriever you fetch the vector store data but don't use it in the method or the returns, looks redundant?

     def full_retriever(question: str):
          graph_data = graph_retriever(question)
          vector_data = [el.page_content for el in vector_retriever.invoke(question)]
          final_data = f'''Graph data:
      {graph_data}
          '''
          return final_data
    

[Ans] Yes. Looks like we are not using vector_data for the Graph RAG but using it for the Hybrid RAG. Will remove it

  • You don't make use of the built in neo4j vector search only the fulltext index - with the vector search you can allow in-graph vector and hybrid search? (you can create vector indexes both for chunks in the lexical graph, for entities in the domain graph and for communities in the topical structures)

[Ans] We have been using Chroma for the templates for vector search.
I do see hybrid search options in below examples.
https://python.langchain.com/docs/integrations/vectorstores/neo4jvector/
https://neo4j.com/labs/genai-ecosystem/langchain/

believe below is using in-graph vector. Am i right? Is there an full example you can share for in-graph vector
https://neo4j.com/developer-blog/global-graphrag-neo4j-langchain/

  • right now the graph retriever only uses the direct neighbourhood of the nodes, this could be a good hyperparameter to add

[Ans] For now we have added GraphRAG as a template. We will include these are individual components and have hyperparameter tuning option

e.g. we have a number of different retrievers in the llm-graph-builder, see: https://github.com/neo4j-labs/llm-graph-builder/blob/DEV/backend/src/shared/constants.py

[Ans] Thanks for the pointer. Will take a look

I saw you copied some code from the neo4j-langchain integrations? Was there a reason (i.e. did you make modifications - if so it might be good to discuss to rather contribute them back upstream?)

[Ans] No. We did not make any modifications.

there is the option to run clustering algorithms to generate cross-document topic summaries across the entity graphs (like in the MSFT GraphRAG paper), see https://neo4j.com/developer-blog/global-graphrag-neo4j-langchain/ (we've also implemented that in https://llm-graph-builder.neo4jlabs.com/ if you have a graph data science enabled database).

[Ans] Thanks. Will take a look.

Thanks for all your feedback. Would be great to chat sometime. We want the improve GraphRAG option in RAGBuilder and would love your contributions as well

@aravind10x
Copy link
Contributor

@jexp - can you pls review @ashwinzyx's comments? Do you have any further thoughts or suggestions? Please feel free to suggest changes or raise a PR to make the Graph RAG part of RAGBuilder even better.

@jexp
Copy link
Author

jexp commented Oct 24, 2024

@aravind10x would probably good to have a chat with me and @tomasonjo at some point, harder to go through these in GH issues :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants