This project demonstrates how to use Retrieval Augmented Generation (RAG) with the Gemini API to extract and process data from PDF documents. The application allows users to upload PDFs, extract text, and interact with the content using a conversational interface.
- Extract text from PDF documents
- Chunk extracted text for better processing
- Generate embeddings for text chunks
- Find the most relevant passage based on user queries
- Generate responses using the Gemini API
- Python 3.7+
- Streamlit
- Google Generative AI (Gemini API)
- Pandas
- NumPy
- Unstructured
- ONNX 1.16.1 (Higher versions lead to DLL errors)
- Tesseract
- Poppler
-
Clone the repository:
git clone https://github.com/agilarasu/gemini_pdf_rag.git cd gemini_pdf_rag
-
Install the required packages:
pip install -r requirements.txt
-
Set up your Gemini API key in .env file:
GEMINI_API_KEY='Your api key'
-
Run the Streamlit application:
streamlit run app.py
-
Upload your PDF documents using the sidebar.
-
Ask questions based on the content of the uploaded PDFs.
To install Poppler PDF rendering library.
You need to download a precompiled version of Poppler and add it to your system’s PATH:
- Download Poppler from here.
- Extract the files to a directory (e.g.,
C:\Program Files\poppler-xx_x_x
). - Add Poppler's
Library/bin
folder to the system’s PATH:- Right-click on This PC or My Computer, and go to Properties.
- Click Advanced system settings, then Environment Variables.
- Under System variables, find Path and click Edit.
- Click New and add the path to the Poppler
bin
folder (e.g.,C:\Program Files\poppler-xx_x_x\Library\bin
). - Press OK to save and close.
You can install Poppler using Homebrew:
- Open the Terminal.
- Run the following commands:
brew install poppler
Poppler can be installed via the package manager:
- Open the Terminal and run:
sudo apt update sudo apt install poppler-utils
You need to download a precompiled version of Tesseract and add it to your system’s PATH:
- Download Tesseract from here.
- Install with the installer file
- Add Tesseract's
C:\Program Files\Tesseract-OCR
folder to system's PATH:- Right-click on This PC or My Computer, and go to Properties.
- Click Advanced system settings, then Environment Variables.
- Under System variables, find Path and click Edit.
- Click New and add the path to the Tesseract
bin
folder (e.g.,C:\Program Files\Tesseract-OCR
). - Press OK to save and close.