RAG Insight

A Streamlit RAG Chatbot for querying PDF documents using a structure-aware chunking approach.

Features

PDF Indexer

The PDF Indexer uses the LLMSherpa API internally for parsing the PDF document. The main sections thus obtained are split recursively using a chunk size of 2048 characters, into subsections that fit within this limit. The hierarchical structure of the document is maintained by returning entire sections rather than arbitrary slices of text. The resulting text chunks are used for building a LlamaIndex query engine, on top of an in-memory VectorStoreIndex.

Streamlit Chatbot

The Streamlit Chatbot allows users to:

Input an OpenAI API key
Upload a PDF document
Ask questions about the document

Installation

Set up LLMSherpa API

Install the nlm-ingestor server:
Follow the instructions at https://github.com/nlmatics/nlm-ingestor
The local llmsherpa_url will be: http://localhost:5001/api/parseDocument?renderFormat=all

Deploy the Chatbot

Clone the repository

git clone https://github.com/IoanaDragan/rag-insight

Install dependencies

pip install -r requirements.txt

(Optional) Set up environment variables: Create a .env file and add your OpenAI API key:
OPENAI_API_KEY=your_api_key_here
Launch the Streamlit server

cd rag-insight
streamlit run rag-chatbot.py

Access the app at http://localhost:8501 in your browser

Customization

The index_pdf method of the PDFIndexer accepts the following parameters:

Parameter	Description	Default
`chunk_size`	Size of document chunks	2048
`first_n_chunks`	Number of chunks to index (for testing)	None (all)
`add_summary`	Add chunk summaries as metadata	False
`retrieve_top_k`	Number of similar documents to retrieve per query	2
`similarity_threshold`	Minimum similarity score for retrieved documents	0.8

License

Distributed under the MIT License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
rag-insight		rag-insight
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Insight

Features

PDF Indexer

Streamlit Chatbot

Installation

Set up LLMSherpa API

Deploy the Chatbot

Customization

License

About

Releases

Packages

Languages

License

IoanaDragan/rag-insight

Folders and files

Latest commit

History

Repository files navigation

RAG Insight

Features

PDF Indexer

Streamlit Chatbot

Installation

Set up LLMSherpa API

Deploy the Chatbot

Customization

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages