This folder contains example data intended to demonstrate the format of the input files and the outputs generated by the Retrieval-Augmented Generation experiments and evaluation framework.
The data provided here is a toy example and not the real data used during the RAG experiments. It is only intended to showcase the required input formats and the outputs generated by the RAG framework. As such, this data will not lead to good results and is purely for demonstrative purposes.
This file contains:
- Questions: The questions posed in the RAG experiments.
- Reference Answers: The ideal answers used for evaluation purposes.
- Metadata: Additional details including an identifier (
doc_id
), a page number (page_number
), and a company name (company_name
).
This file holds:
- Extracted Texts: Textual content extracted from input documents.
- Image Bytes: Byte representation of images.
This folder contains:
- Textual Summaries of Images: Generated using either GPT-4Vision or LLaVA. These summaries provide a description of the visual content of the images.
This folder includes sample images extracted from PDFs, used to demonstrate the type of visual content processed in RAG experiments.
This folder includes:
- Evaluation Results File: Contains the evaluation outcomes of a RAG pipeline. Each result includes:
- Grades for Each Metric: A binary score indicating whether the respective metric was met.
- Reasons for Grades: An explanation for each given grade, providing insights into the evaluation process.
This folder holds:
- Example Output of a RAG System: A sample file demonstrating the typical output produced by the RAG system during experiments.
This folder contains:
- Texts and Their Summaries: An example file with original texts alongside summaries generated by a Large Language Model (LLM).
This folder includes:
- Sample Vector Stores: Two sample vector stores containing embedded texts and images/image summaries that are used for retrieval.