In this short project. I attempt to create a RAG application that can parse a PDF document (in this case from research papers) and extract the main information from it in a structured tabular form.
Started off with a simple Jupyter notebook with the outline of creating a structured data retrieval system from an unstructured source (documents, PDFs, websites etc.). I also wanted to test out the capabilities and limitations of new lightweight LLM models. After going through multiple examples on YouTube I found THU VU's video on creating an LLM based RAG for structured data using Open AI's API key and Encoders to be the most suited for my need. I personally wanted to try and use Google's Gemini models and so in this project I have used Gemini 1.5 Flash.
More than the metrics and functionality of the LLM used I wanted to mainly gauge how quickly can we move from an idea to a notebook to an MVP(Minimum viable Product) from scratch. Except for understanding streamlit's session state and how to use it to manage passing the API from user input and .env
file; the streamlit app was fairly straightforward to code using the . While creating the app can be done on one single python file, I found that keeping the app's frontend separate from the underlying functions was a cleaner and easier way to manage and debug the code.
Python 3.11
or higher. You will need docker Docker to run the app as a container.
Go to the app folder in terminal.
cd ../app
Build the image using the dockerfile
docker build -t streamlit-app .
Run the image as a container
docker run -e API_KEY="your-secret-key" -p 8501:8501 streamlit-app
In streamlit_app.py file, read the API Key (from .env
file or from user input)
import os
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.getenv("API_KEY")
import streamlit as st
from functions import *
import base64
# Initialize the API key in session state if it doesn't exist from .env file
if "api_key" not in st.session_state:
st.session_state.api_key = API_KEY
.....
.....
# from user input
st.text_input('API key', type='password', key='api_key',
label_visibility="collapsed", disabled=False)
You can also provide API key at runtime. (But I personally found it buggy so kept both options as a backup for the other.)