This is a module for Terraform that deploys Airflow in AWS.
- An ECS Cluster with:
- Sidecar injection container
- Airflow init container
- Airflow webserver container
- Airflow scheduler container
- An ALB
- A RDS instance (optional but recommended)
- A DNS Record (optional but recommended)
- A S3 Bucket (optional)
Average cost of the minimal setup (with RDS): ~50$/Month
Why do I need a RDS instance?
- This makes Airflow statefull, you will be able to rerun failed dags, keep history of failed/succeeded dags, ...
- It allows for dags to run concurrently, otherwise two dags will not be able to run at the same time
- The state of your dags persists, even if the Airflow container fails or if you update the container definition (this will trigger an update of the ECS task)
The Airflow setup provided with this module, is a setup where the only task of Airflow is to manage your jobs/workflows. So not to do actually heavy lifting like SQL queries, Spark jobs, ... . Offload as many task to AWS Lambda, AWS EMR, AWS Glue, ... . If you want Airflow to have access to these services, use the output role and give it permissions to these services through IAM.
module "airflow" {
source = "datarootsio/ecs-airflow/aws"
resource_prefix = "my-awesome-company"
resource_suffix = "env"
vpc_id = "vpc-123456"
public_subnet_ids = ["subnet-456789", "subnet-098765"]
rds_password = "super-secret-pass"
}
(This will create Airflow, backed up by an RDS (both in a public subnet) and without https)
Press here to see more examples
Note: After that Terraform is done deploying everything, it can take up to a minute for Airflow to be available through HTTP(S)
To add dags, upload them to the created S3 bucket in the subdir "dags/". After you uploaded them run the seed dag. This will sync the s3 bucket with the local dags folder of the ECS container.
For now the only authentication option is 'RBAC'. When enabling this, this module will create a default admin role (only if there are no users in the database). This default role is just a one time entrypoint in to the airflow web interface. When you log in for the first time immediately change the password! Also with this default admin role you can create any user you want.
- RDS Backup options
- Option to use SQL instead of Postgres
- Add a Lambda function that triggers the sync dag (so that you can auto sync through ci/cd)
- RBAC
- Support for Google OAUTH
Name | Version |
---|---|
terraform | ~> 0.15 |
aws | ~> 3.12.0 |
Name | Version |
---|---|
aws | ~> 3.12.0 |
Name | Description | Type | Default | Required |
---|---|---|---|---|
airflow_authentication | Authentication backend to be used, supported backends ["", "rbac"]. When "rbac" is selected an admin role is create if there are no other users in the db, from here you can create all the other users. Make sure to change the admin password directly upon first login! (if you don't change the rbac_admin options the default login is => username: admin, password: admin) | string |
"" |
no |
airflow_container_home | Working dir for airflow (only change if you are using a different image) | string |
"/opt/airflow" |
no |
airflow_example_dag | Add an example dag on startup (mostly for sanity check) | bool |
true |
no |
airflow_executor | The executor mode that airflow will use. Only allowed values are ["Local", "Sequential"]. "Local": Run DAGs in parallel (will created a RDS); "Sequential": You can not run DAGs in parallel (will NOT created a RDS); | string |
"Local" |
no |
airflow_image_name | The name of the airflow image | string |
"apache/airflow" |
no |
airflow_image_tag | The tag of the airflow image | string |
"2.0.1" |
no |
airflow_log_region | The region you want your airflow logs in, defaults to the region variable | string |
"" |
no |
airflow_log_retention | The number of days you want to keep the log of airflow container | string |
"7" |
no |
airflow_py_requirements_path | The relative path to a python requirements.txt file to install extra packages in the container that you can use in your DAGs. | string |
"" |
no |
airflow_variables | The variables passed to airflow as an environment variable (see airflow docs for more info https://airflow.apache.org/docs/). You can not specify "AIRFLOW__CORE__SQL_ALCHEMY_CONN" and "AIRFLOW__CORE__EXECUTOR" (managed by this module) | map(string) |
{} |
no |
certificate_arn | The ARN of the certificate that will be used | string |
"" |
no |
dns_name | The DNS name that will be used to expose Airflow. Optional if not serving over HTTPS. Will be autogenerated if not provided | string |
"" |
no |
ecs_cpu | The allocated cpu for your airflow instance | number |
1024 |
no |
ecs_memory | The allocated memory for your airflow instance | number |
2048 |
no |
extra_tags | Extra tags that you would like to add to all created resources | map(string) |
{} |
no |
ip_allow_list | A list of ip ranges that are allowed to access the airflow webserver, default: full access | list(string) |
[ |
no |
postgres_uri | The postgres uri of your postgres db, if none provided a postgres db in rds is made. Format "<db_username>:<db_password>@<db_endpoint>:<db_port>/<db_name>" | string |
"" |
no |
private_subnet_ids | A list of subnet ids of where the ECS and RDS reside, this will only work if you have a NAT Gateway in your VPC | list(string) |
[] |
no |
public_subnet_ids | A list of subnet ids of where the ALB will reside, if the "private_subnet_ids" variable is not provided ECS and RDS will also reside in these subnets | list(string) |
n/a | yes |
rbac_admin_email | RBAC Email (only when airflow_authentication = 'rbac') | string |
"[email protected]" |
no |
rbac_admin_firstname | RBAC Firstname (only when airflow_authentication = 'rbac') | string |
"admin" |
no |
rbac_admin_lastname | RBAC Lastname (only when airflow_authentication = 'rbac') | string |
"airflow" |
no |
rbac_admin_password | RBAC Password (only when airflow_authentication = 'rbac') | string |
"admin" |
no |
rbac_admin_username | RBAC Username (only when airflow_authentication = 'rbac') | string |
"admin" |
no |
rds_allocated_storage | The allocated storage for the rds db in gibibytes | number |
20 |
no |
rds_availability_zone | Availability zone for the rds instance | string |
"eu-west-1a" |
no |
rds_deletion_protection | Deletion protection for the rds instance | bool |
false |
no |
rds_engine | The database engine to use. For supported values, see the Engine parameter in API action CreateDBInstance | string |
"postgres" |
no |
rds_instance_class | The class of instance you want to give to your rds db | string |
"db.t2.micro" |
no |
rds_password | Password of rds | string |
"" |
no |
rds_skip_final_snapshot | Whether or not to skip the final snapshot before deleting (mainly for tests) | bool |
false |
no |
rds_storage_type | One of "standard" (magnetic), "gp2" (general purpose SSD), or "io1" (provisioned IOPS SSD) |
string |
"standard" |
no |
rds_username | Username of rds | string |
"airflow" |
no |
rds_version | The DB version to use for the RDS instance | string |
"12.7" |
no |
region | The region to deploy your solution to | string |
"eu-west-1" |
no |
resource_prefix | A prefix for the create resources, example your company name (be aware of the resource name length) | string |
n/a | yes |
resource_suffix | A suffix for the created resources, example the environment for airflow to run in (be aware of the resource name length) | string |
n/a | yes |
route53_zone_name | The name of a Route53 zone that will be used for the certificate validation. | string |
"" |
no |
s3_bucket_name | The S3 bucket name where the DAGs and startup scripts will be stored, leave this blank to let this module create a s3 bucket for you. WARNING: this module will put files into the path "dags/" and "startup/" of the bucket | string |
"" |
no |
use_https | Expose traffic using HTTPS or not | bool |
false |
no |
vpc_id | The id of the vpc where you will run ECS/RDS | string |
n/a | yes |
Name | Description |
---|---|
airflow_alb_dns | The DNS name of the ALB, with this you can access the Airflow webserver |
airflow_connection_sg | The security group with which you can connect other instance to Airflow, for example EMR Livy |
airflow_dns_record | The created DNS record (only if "use_https" = true) |
airflow_task_iam_role | The IAM role of the airflow task, use this to give Airflow more permissions |
Available targets:
tools Pull Go and Terraform dependencies
fmt Format Go and Terraform code
lint/lint-tf/lint-go Lint Go and Terraform code
test/testverbose Run tests
Contributions to this repository are very welcome! Found a bug or do you have a suggestion? Please open an issue. Do you know how to fix it? Pull requests are welcome as well! To get you started faster, a Makefile is provided.
Make sure to install Terraform, Go (for automated testing) and Make (optional, if you want to use the Makefile) on your computer. Install tflint to be able to run the linting.
- Setup tools & dependencies:
make tools
- Format your code:
make fmt
- Linting:
make lint
- Run tests:
make test
(orgo test -timeout 2h ./...
without Make)
Make sure you branch from the 'open-pr-here' branch, and submit a PR back to the 'open-pr-here' branch.
MIT license. Please see LICENSE for details.