Introduction

This is a working RAG (Retrieval-Augmented Generation) model built using Elasticsearch and OpenAI's GPT API. It indexes a Shakespearean dataset and provides explanations for prompts in Shakespearean English. This project is created for CSE 512-extra credits requirements.Generative AI was used to creating the project.It only performs semantic searching on text. No other searching is performed.

Data used for this project can be found here

The provided content represents a data format intended for ingestion into Elasticsearch, a distributed search engine and analytics platform. Each document corresponds to an entry (such as an act, scene, or line) from Shakespeare's play "Henry IV." Here's a detailed explanation of the structure

Key Elements:

Index Directive: Example:

    {"index":{"_index":"shakespeare","_id":0}}

_index: Specifies the Elasticsearch index where the document will be stored (shakespeare in this case).
_id: The unique identifier for the document within the index. Each document has its own ID, incrementing sequentially in this example.

Document Content:

Example:

        {"type":"act","line_id":1,"play_name":"Henry IV", "speech_number":"","line_number":"","speaker":"","text_entry":"ACT I"}

Represents the data for a specific entry. Key fields include:

type: The category of the entry (e.g., act, scene, or line).
line_id: A unique identifier for each line or entry in the play.
play_name: The name of the play (Henry IV in this case).
speech_number: Denotes a speech's sequence. It’s empty for entries like acts or scenes but populated for dialogue lines.
line_number: The location of the line within the play, formatted as Act.Scene.Line.
speaker: The character delivering the line (empty for acts or scenes).
text_entry: The actual content of the entry (e.g., dialogue, act, or scene description). Each text_entry is used for semantic searching

Prompt passed into the GPT API was

I have retrieved a line from a Shakespeare database based on a KNN search with the query '{QUERY}'.
Here is the closest result:
Score: {hit['_score']}
Text: {hit['_source']['text_entry']}
Please provide an analysis or interpretation of the retrieved line in the context of Shakespeare's works, focusing on themes and language style.

Getting Started

Create the elastic search account to access the cloud platform here
Create the OpenAI API key here

├── app.py
├── mp3_files
└── my-app

Front End : The front end code is present is in the my-app dir , it consists of the chat-interface.tsx which is a type script file consists of the chat UI as shown below.It provides an interface working with the model.
Back End : The backend is code is present in app.py file.It consists of the openai api,and the main elastic search based indexing using semantic search as shown in the example.
Front End

cd my-app

and run to install all dependencies

npm install

or use yarn to install all dependencies

yarn install

Back End

pip install elasticsearch openai flask gtts

Preview of the Working Application

Demonstration video

Note: the error in the video is due to the fact that there was no speaker built in on the system this project was designed on so I was unable to try out the mp3 file on the same system.However, there exists an mp3 file for demonstration purposes.

Frameworks

Front End: Built using Next.js (requires Node.js 18 or above installed on the system)
Back End: Built using Flask

To Run the Application

Front End

cd my-app
npm run dev

Back End

On a seperate terminal in the parent directory

python3 app.py

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
mp3_files		mp3_files
my-app		my-app
Bonus_Submission.mp4		Bonus_Submission.mp4
README.md		README.md
app.py		app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Data used for this project can be found here

Key Elements:

Document Content:

Prompt passed into the GPT API was

Getting Started

Preview of the Working Application

Demonstration video

Frameworks

To Run the Application

Front End

Back End

About

Releases

Packages

Languages

joe-rabbit/cse512-bonus

Folders and files

Latest commit

History

Repository files navigation

Introduction

Data used for this project can be found here

Key Elements:

Document Content:

Prompt passed into the GPT API was

Getting Started

Preview of the Working Application

Demonstration video

Frameworks

To Run the Application

Front End

Back End

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages