This repository contains the DAG code used in the Orchestrate OpenAI operations with Apache Airflow tutorial.
The DAG in this repository uses the following packages:
- OpenAI Airflow provider.
- OpenAI Python client
- scikit-learn.
- pandas.
- numpy.
- matplotlib.
- seaborn.
- AdjustText.
This section explains how to run this repository with Airflow. Note that you will need to copy the contents of the .env_example
file to a newly created .env
file. You will need to have a valid OpenAI API key of at least tier 1 to run this repository.
Download the Astro CLI to run Airflow locally in Docker. astro
is the only package you will need to install locally.
- Run
git clone https://github.com/astronomer/airflow-openai-tutorial.git
on your computer to create a local clone of this repository. - Install the Astro CLI by following the steps in the Astro CLI documentation. Docker Desktop/Docker Engine is a prerequisite, but you don't need in-depth Docker knowledge to run Airflow with the Astro CLI.
- Create a
.env
file in the root of your cloned repository and copy the contents of the.env_example
file to it. Provide your own OpenAI API key in the.env
file. - Run
astro dev start
in your cloned repository. - After your Astro project has started. View the Airflow UI at
localhost:8080
. - Run the
captains_dag
DAG manually by clicking the play button. Provide your own question and adjust the parameters in the DAG to your liking.
In this project astro dev start
spins up 4 Docker containers:
- The Airflow webserver, which runs the Airflow UI and can be accessed at
https://localhost:8080/
. - The Airflow scheduler, which is responsible for monitoring and triggering tasks.
- The Airflow triggerer, which is an Airflow component used to run deferrable operators.
- The Airflow metadata database, which is a Postgres database that runs on port 5432.