Code for blog at: Democratize Data Access with RAGS
We will use LlamaIndex to build our RAG pipeline. The concepts used to RAG pipelines in general.
GitHub Repo: Data Helper
We will clone the repo setup poetry shell as shown below:
git clone https://github.com/josephmachado/data_helper.git
cd data_helper
poetry install
poetry shell # activate the virtual env
# To run the code, please set your OPEN AI API key as shown below
export OPENAI_API_KEY=your-key-here
python run_code.py INDEX # Create an index with data from ./data folder
python run_code.py QUERY --query "show me for each buyers what date they made their first purchase"
# The above command uses the already existing index to make a request to LLM API to get results
# The code will return a SQL query with DuckDB format
python run_code.py QUERY --query "for every seller, show me a monthly report of the number of unique products that they sold, avg cost per product, max/min value of product purchased that month"
# The code will return a SQL query with DuckDB format
- Evaluate results and tune the pipeline
- Add observation system
- Monitor API costs
- Add additional documentation as input
- Explore other use cases such as RAGs for onboarding, DE training tool, etc