An experiment to compare "regular" (hereafter referred to as baseline) RAG and GraphRAG.
I indexed my favourite plasma physics paper by Alex Scheckochihin in two different ways. Before indexing, I extracted the pdf into a single text file using the code in data_extraction.py
- Baseline RAG -- chunking the text, calculating embeddings and storing them in pinecone
- GraphRAG -- indexing the graph using their built-in indexing pipeline
And then ran queries against both.
The query used was What are the main themes of this article?
. The results were: