This repository contains a number of Terraform modules for creation of the pre-requisite Cloud resources on AWS, Azure and GCP and the deployment of Cloudera Data Platform (CDP) Public Cloud.
Module name | Description |
---|---|
terraform-cdp-aws-pre-reqs | For all AWS pre-requisite Cloud resources |
terraform-cdp-azure-pre-reqs | For all Azure pre-requisite Cloud resources |
terraform-cdp-gcp-pre-reqs | For all GCP pre-requisite Cloud resources |
terraform-cdp-deploy | For deployment of CDP on AWS, Azure or GCP. |
terraform-aws-cred-permissions | Module for creation of the Cross Account Credential pre-requisite on AWS. Note that this module is called from the terraform-cdp-aws-prereqs module. |
terraform-aws-permissions | Module for creation of the AWS IAM permissions required by the (CDP) Public Cloud environment and datalake deployment. Note that this module is called from the terraform-cdp-aws-prereqs module. |
terraform-aws-vpc | Module for creation of the VPC networking resources on AWS suitable. Can be used to create the CDP VPC and Subnets. Note that this module is called from the terraform-cdp-aws-prereqs module. |
terraform-aws-tgw | Module for creation of AWS Transity Gateway (TGW) and attaching a specified list of VPCs via the TGW. This module can be used to assist in deploying Cloudera Data Platform (CDP) Public Cloud in a fully private networking configuration where a CDP VPC and Networking VPC are connected using the Transit Gateway. |
terraform-aws-bastion | Module to create a Bastion EC2 instance on AWS. This module can be used to assist in deploying Cloudera Data Platform (CDP) Public Cloud in a secure environment, where the CDP Environment requires a Bastion host. |
terraform-aws-proxy | Module to create and configure and EC2 Auto-Scaling Group for a highly available Squid Proxy service with Network Load Balancer (NLB) to forward traffic to the proxy instances. This module can be used to assist in deploying Cloudera Data Platform (CDP) Public Cloud in a fully private networking configuration where a the CDP Environments uses a proxy config via the NLB. |
terraform-azure-nfs | Module for creation of Azure NFS File Share required for Cloudera Machine Learning (CML) Public Cloud. Also optionally creates a Virtual Machine which can be used to mount and set the required ownership for CML workspace's projects folder. |
terraform-azure-cdw-permissions | Module for creation of the Azure Kubernetes Service (AKS) managed identity required for the Cloudera Data Warehouse (CDW) service. |
terraform-azure-storage-endpoints | Module for creation creation of Azure private endpoints between specified storage accounts and VNet subnets. |
Each module contains Terraform resource configuration and example variable definition files.
The cdp-tf-quickstarts repository demonstrates how to use the modules together to deploy CDP on different cloud environments.
Each module also has a set of examples to show different configuration options for that module.
Note that the instructions below give the steps to create pre-requisite resources and the CDP deployment all together. The modules can be used on their own to allow further customization.
- Clone this repository using the following commands:
git clone https://github.com/cloudera-labs/terraform-cdp-modules.git
cd terraform-cdp-modules
- To create cloud pre-requisite resources and the CDP deployment all together, change to the terraform-cdp-deploy directory and select one of the cloud providers.
cd modules/terraform-cdp-deploy/examples/ex<deployment_type>/
-
Create a
terraform.tfvars
file with variable definitions to run the module. Reference theterraform.tfvars.sample
file in each example folder to create this file. -
Run the Terraform module for the chosen deployment type:
terraform init
terraform apply
Once the deployment completes, you can create CDP Data Hubs and Data Services from the CDP Management Console (https://cdp.cloudera.com/).
If you no longer need the infrastructure that’s provisioned by the Terraform module, run the following command to remove the deployment infrastructure and terminate all resources.
terraform destroy
To set up CDP via deployment automation using this guide, the following dependencies must be installed in your local environment:
- Terraform can be installed by following the instructions at https://developer.hashicorp.com/terraform/downloads
Configure Terraform Provider for AWS, Azure or GCP
-
Configure the Terraform Provider for CDP with access key ID and private key by dowloading or creating a CDP configuation file.
- See the CDP documentation for steps to Generate the API access key.
- See the CDP Terraform Provider Documentation and DEVELOPMENT.md for the different ways of providing the CDP credentials for authentication.
-
To create resources in the Cloud Provider, access credentials or service account are needed for authentication.
- For AWS access keys are required to be able to create the Cloud resources via the Terraform aws provider. See the AWS Terraform Provider Documentation.
- For Azure, authentication with the Azure subscription is required. There are a number of ways to do this outlined in the Azure Terraform Provider Documentation.
- For GCP, authentication with the GCP API is required. There are a number of ways to do this outlined in the Google Terraform Provider Documentation.
-
Where you have more than one Azure Subscription the id to use can be passed via the the
ARM_SUBSCRIPTION_ID
environment variable. -
When using a Service Principal (SP) to authenticate with Azure, it is not possible to authenticate with azuread Terraform Provider (the provider used to create the Azure Cross Account AD Application) with the command az login --service-principal. We found the the best way to authenticate using an SP is by setting environment variables. Details of required environment variables are in the azuread docs and azurerm docs and summarized below.
export ARM_CLIENT_ID="<sp_client_id>"
export ARM_CLIENT_SECRET="<sp_client_secret>"
export ARM_TENANT_ID="<sp_tenant_id>"
export ARM_SUBSCRIPTION_ID="<sp_subscription_id>"
As outlined in the Getting Started Docs for Google Terraform Provider there are two recommended ways to authenticate with the GCP API.
-
The Google Cloud SDK (
gcloud
) can be installed and a User Application Default Credentials ("ADCs") can be created by running the commandgcloud auth application-default login
-
A Google Cloud Service Account key file can be generated and downloaded. The
GOOGLE_APPLICATION_CREDENTIALS
environment variable can then be set to the location of the file.
export GOOGLE_APPLICATION_CREDENTIALS=<location_of_gcp_sa_json_file>
The Google project Id can be specified via the Google provider configuration variable or the GOOGLE_PROJECT
environment variable. This is described in the Google Provider Default Values Configuration documentation.
See the DEVELOPMENT.md file for instructions on how to set up an environment for local development of modules.