Add GPU nodepools to CarbonPlan's Azure cluster #931

sgibson91 · 2022-01-17T11:53:41Z

This PR is the first step towards #930 and provides GPU machines to notebook and dask workers. It also adds support for adding labels and taints to specific nodepools in a similar manner to what is implemented in our GCP config.

NOTE: Due to #890, I had to run a bespoke terraform plan command that only targeted the cluster and nodepools.

Full terraform plan command (with some escaping to make [] and " work with my shell):

terraform plan \
-var-file projects/carbonplan.tfvars \
-out carbonplan \
-target azurerm_kubernetes_cluster.jupyterhub \
-target azurerm_kubernetes_cluster_node_pool.user_pool\[\"small\"\] \
-target azurerm_kubernetes_cluster_node_pool.user_pool\[\"medium\"\] \
-target azurerm_kubernetes_cluster_node_pool.user_pool\[\"large\"\] \
-target azurerm_kubernetes_cluster_node_pool.user_pool\[\"huge\"\] \
-target azurerm_kubernetes_cluster_node_pool.user_pool\[\"vhuge\"\] \
-target azurerm_kubernetes_cluster_node_pool.user_pool\[\"vvhuge\"\] \
-target azurerm_kubernetes_cluster_node_pool.user_pool\[\"gpu\"\] \
-target azurerm_kubernetes_cluster_node_pool.dask_pool\[\"small\"\] \
-target azurerm_kubernetes_cluster_node_pool.dask_pool\[\"medium\"\] \
-target azurerm_kubernetes_cluster_node_pool.dask_pool\[\"large\"\] \
-target azurerm_kubernetes_cluster_node_pool.dask_pool\[\"huge\"\] \
-target azurerm_kubernetes_cluster_node_pool.dask_pool\[\"vhuge\"\] \
-target azurerm_kubernetes_cluster_node_pool.dask_pool\[\"vvhuge\"\] \
-target azurerm_kubernetes_cluster_node_pool.dask_pool\[\"gpu\"\]

Full output:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # azurerm_kubernetes_cluster_node_pool.dask_pool["gpu"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "dask_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster"
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "daskgpu"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-size" = "Standard_NC16as_T4_v3"
          + "k8s.dask.org/node-purpose" = "worker"
        }
      + node_taints           = [
          + "k8s.dask.org_dedicated=worker:NoSchedule",
          + "sku=gpu:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_NC16as_T4_v3"
      + vnet_subnet_id        = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourceGroups/2i2c-carbonplan-cluster/providers/Microsoft.Network/virtualNetworks/k8s-network/subnets/k8s-nodes-subnet"
    }

  # azurerm_kubernetes_cluster_node_pool.user_pool["gpu"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "user_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster"
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "nbgpu"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-purpose" = "user"
          + "hub.jupyter.org/node-size"    = "Standard_NC16as_T4_v3"
          + "k8s.dask.org/node-purpose"    = "scheduler"
        }
      + node_taints           = [
          + "hub.jupyter.org_dedicated=user:NoSchedule",
          + "sku=gpu:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_NC16as_T4_v3"
      + vnet_subnet_id        = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourceGroups/2i2c-carbonplan-cluster/providers/Microsoft.Network/virtualNetworks/k8s-network/subnets/k8s-nodes-subnet"
    }

Plan: 2 to add, 0 to change, 0 to destroy.
╷
│ Warning: Resource targeting is in effect
│ 
│ You are creating a plan with the -target option, which means that the result of this plan may not represent all of the changes requested by the current
│ configuration.
│ 
│ The -target option is not for routine use, and is provided only for exceptional situations such as recovering from errors or mistakes, or when Terraform
│ specifically suggests to use it as part of an error message.

yuvipanda · 2022-01-17T11:59:59Z

Terraform LGTM. Will need an entry in profile list targetting this node pool, and might also need some extra work to make sure the drivers are available inside the image - maybe the pangeo ML image already has those? I use the nvidia-smi command (https://developer.nvidia.com/nvidia-system-management-interface) inside the container to test if the GPUs are properly exposed.

sgibson91 · 2022-01-17T12:03:37Z

Will need an entry in profile list targetting this node pool

Yeah, I was going to do this in a separate PR as I think it's too complicated to do terraform things and JupyterHub things in one go. Updating the profile list is being tracked in #930

need some extra work to make sure the drivers are available inside the image - maybe the pangeo ML image already has those?

This hub is using their own image: carbonplan/cmip6-downscaling-single-user so I believe we can leave it up to them to make sure their image is ready for GPU-use

yuvipanda · 2022-01-17T12:09:03Z

Yeah, I was going to do this in a separate PR as I think it's too complicated to do terraform things and JupyterHub things in one go. Updating the profile list is being tracked in #930

Makes sense!

This hub is using their own image: carbonplan/cmip6-downscaling-single-user so I believe we can leave it up to them to make sure their image is ready for GPU-use

Ah cool!

However, I looked up what would need to be done, and it isn't just what needs to be done at the user level - see https://docs.microsoft.com/en-us/azure/aks/gpu-cluster. Either we need to get Azure to use a different base image for GPU, or setup an additional daemonset to install the driver on the node. This is because the nvidia driver isn't actually open source. This is unfortunately true on almost all kubernetes providers.

yuvipanda · 2022-01-17T12:16:38Z

From hashicorp/terraform-provider-azurerm#6793 it looks like the custom base image option might not be available to us, and we'd need to deploy the daemonset. Maybe it can be part of the support chart and enabled with a flag?

Doesn't need to be part of this PR tho!

sgibson91 · 2022-01-17T13:34:15Z

@yuvipanda This was the same conclusion I was coming to as well!

I'm just going to push a few more small changes to this PR that will allow us to apply labels to specific nodepools, like we can with the GKE terraform config, but also taints too. So I can deploy this with the recommended sku=gpu:NoSchedule taint from the Azure docs you linked.

sgibson91 · 2022-01-17T13:48:38Z

@yuvipanda Pushed those updates and added new tf plan output to top comment. LMK what you think! \o/

yuvipanda

LGTM!

sgibson91 · 2022-01-17T14:00:33Z

Ran into this error on apply, the main cluster may need upgrading

│ Error:
│ The Kubernetes/Orchestrator Version "1.20.7" is not available for Node Pool "nbgpu".
│
│ Please confirm that this version is supported by the Kubernetes Cluster "hub-cluster"
│ (Resource Group "2i2c-carbonplan-cluster") - which may need to be upgraded first.
│
│ The Kubernetes Cluster is running version "1.20.7".
│
│ The supported Orchestrator Versions for this Node Pool/supported by this Kubernetes Cluster are:
│ * 1.19.11
│ * 1.19.13
│
│ Node Pools cannot use a version of Kubernetes that is not supported on the Control Plane. More
│ details can be found at https://aka.ms/version-skew-policy.
│
│
│ with azurerm_kubernetes_cluster_node_pool.user_pool["gpu"],
│ on main.tf line 117, in resource "azurerm_kubernetes_cluster_node_pool" "user_pool":
│ 117: resource "azurerm_kubernetes_cluster_node_pool" "user_pool" {

sgibson91 · 2022-01-17T14:11:48Z

Commit 47cf790 fixed the above error by specifying a specific kubernetes version for a nodepool

sgibson91 · 2022-01-17T14:12:31Z

This has been successfully applied to the cluster! 🎉 🚀

choldgraf · 2022-01-17T17:22:15Z

Thanks @sgibson91 and @yuvipanda for documenting all of these steps as well :-)

Add GPU nodepools to CarbonPlan's Azure cluster

79f29e4

sgibson91 self-assigned this Jan 17, 2022

sgibson91 mentioned this pull request Jan 17, 2022

Enable GPUs on the Carbon Plan Azure hub #930

Closed

3 tasks

sgibson91 requested a review from a team January 17, 2022 11:57

sgibson91 marked this pull request as ready for review January 17, 2022 11:57

sgibson91 added 4 commits January 17, 2022 13:39

Support node labels and taints for specific nodepools

9d29990

Add placeholder labels/taints; Add GPU labels/taints

f603deb

Update var definitions of notebook_nodes and dask_nodes

1d7268c

Use concat instead of merge for list objects

66f850b

yuvipanda approved these changes Jan 17, 2022

View reviewed changes

Allow specific k8s version per nodepool in case of GPU

47cf790

sgibson91 merged commit c7bbde9 into 2i2c-org:master Jan 17, 2022

sgibson91 deleted the carbonplan-gpu-nodes branch January 17, 2022 14:13

sgibson91 mentioned this pull request Nov 9, 2022

Document setting up GPUs on our clusters and access for our hubs #996

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPU nodepools to CarbonPlan's Azure cluster #931

Add GPU nodepools to CarbonPlan's Azure cluster #931

sgibson91 commented Jan 17, 2022 •

edited

Loading

yuvipanda commented Jan 17, 2022

sgibson91 commented Jan 17, 2022

yuvipanda commented Jan 17, 2022

yuvipanda commented Jan 17, 2022

sgibson91 commented Jan 17, 2022 •

edited

Loading

sgibson91 commented Jan 17, 2022

yuvipanda left a comment

sgibson91 commented Jan 17, 2022

sgibson91 commented Jan 17, 2022

sgibson91 commented Jan 17, 2022

choldgraf commented Jan 17, 2022

Add GPU nodepools to CarbonPlan's Azure cluster #931

Add GPU nodepools to CarbonPlan's Azure cluster #931

Conversation

sgibson91 commented Jan 17, 2022 • edited Loading

yuvipanda commented Jan 17, 2022

sgibson91 commented Jan 17, 2022

yuvipanda commented Jan 17, 2022

yuvipanda commented Jan 17, 2022

sgibson91 commented Jan 17, 2022 • edited Loading

sgibson91 commented Jan 17, 2022

yuvipanda left a comment

Choose a reason for hiding this comment

sgibson91 commented Jan 17, 2022

sgibson91 commented Jan 17, 2022

sgibson91 commented Jan 17, 2022

choldgraf commented Jan 17, 2022

sgibson91 commented Jan 17, 2022 •

edited

Loading

sgibson91 commented Jan 17, 2022 •

edited

Loading