Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPU nodepools to CarbonPlan's Azure cluster #931

Merged
merged 6 commits into from
Jan 17, 2022

Conversation

sgibson91
Copy link
Member

@sgibson91 sgibson91 commented Jan 17, 2022

This PR is the first step towards #930 and provides GPU machines to notebook and dask workers. It also adds support for adding labels and taints to specific nodepools in a similar manner to what is implemented in our GCP config.

NOTE: Due to #890, I had to run a bespoke terraform plan command that only targeted the cluster and nodepools.

Full terraform plan command (with some escaping to make [] and " work with my shell):

terraform plan \
-var-file projects/carbonplan.tfvars \
-out carbonplan \
-target azurerm_kubernetes_cluster.jupyterhub \
-target azurerm_kubernetes_cluster_node_pool.user_pool\[\"small\"\] \
-target azurerm_kubernetes_cluster_node_pool.user_pool\[\"medium\"\] \
-target azurerm_kubernetes_cluster_node_pool.user_pool\[\"large\"\] \
-target azurerm_kubernetes_cluster_node_pool.user_pool\[\"huge\"\] \
-target azurerm_kubernetes_cluster_node_pool.user_pool\[\"vhuge\"\] \
-target azurerm_kubernetes_cluster_node_pool.user_pool\[\"vvhuge\"\] \
-target azurerm_kubernetes_cluster_node_pool.user_pool\[\"gpu\"\] \
-target azurerm_kubernetes_cluster_node_pool.dask_pool\[\"small\"\] \
-target azurerm_kubernetes_cluster_node_pool.dask_pool\[\"medium\"\] \
-target azurerm_kubernetes_cluster_node_pool.dask_pool\[\"large\"\] \
-target azurerm_kubernetes_cluster_node_pool.dask_pool\[\"huge\"\] \
-target azurerm_kubernetes_cluster_node_pool.dask_pool\[\"vhuge\"\] \
-target azurerm_kubernetes_cluster_node_pool.dask_pool\[\"vvhuge\"\] \
-target azurerm_kubernetes_cluster_node_pool.dask_pool\[\"gpu\"\]

Full output:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # azurerm_kubernetes_cluster_node_pool.dask_pool["gpu"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "dask_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster"
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "daskgpu"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-size" = "Standard_NC16as_T4_v3"
          + "k8s.dask.org/node-purpose" = "worker"
        }
      + node_taints           = [
          + "k8s.dask.org_dedicated=worker:NoSchedule",
          + "sku=gpu:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_NC16as_T4_v3"
      + vnet_subnet_id        = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourceGroups/2i2c-carbonplan-cluster/providers/Microsoft.Network/virtualNetworks/k8s-network/subnets/k8s-nodes-subnet"
    }

  # azurerm_kubernetes_cluster_node_pool.user_pool["gpu"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "user_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster"
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "nbgpu"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-purpose" = "user"
          + "hub.jupyter.org/node-size"    = "Standard_NC16as_T4_v3"
          + "k8s.dask.org/node-purpose"    = "scheduler"
        }
      + node_taints           = [
          + "hub.jupyter.org_dedicated=user:NoSchedule",
          + "sku=gpu:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_NC16as_T4_v3"
      + vnet_subnet_id        = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourceGroups/2i2c-carbonplan-cluster/providers/Microsoft.Network/virtualNetworks/k8s-network/subnets/k8s-nodes-subnet"
    }

Plan: 2 to add, 0 to change, 0 to destroy.
╷
│ Warning: Resource targeting is in effect
│ 
│ You are creating a plan with the -target option, which means that the result of this plan may not represent all of the changes requested by the current
│ configuration.
│ 
│ The -target option is not for routine use, and is provided only for exceptional situations such as recovering from errors or mistakes, or when Terraform
│ specifically suggests to use it as part of an error message.

@sgibson91 sgibson91 self-assigned this Jan 17, 2022
@sgibson91 sgibson91 requested a review from a team January 17, 2022 11:57
@sgibson91 sgibson91 marked this pull request as ready for review January 17, 2022 11:57
@yuvipanda
Copy link
Member

Terraform LGTM. Will need an entry in profile list targetting this node pool, and might also need some extra work to make sure the drivers are available inside the image - maybe the pangeo ML image already has those? I use the nvidia-smi command (https://developer.nvidia.com/nvidia-system-management-interface) inside the container to test if the GPUs are properly exposed.

@sgibson91
Copy link
Member Author

Will need an entry in profile list targetting this node pool

Yeah, I was going to do this in a separate PR as I think it's too complicated to do terraform things and JupyterHub things in one go. Updating the profile list is being tracked in #930

need some extra work to make sure the drivers are available inside the image - maybe the pangeo ML image already has those?

This hub is using their own image: carbonplan/cmip6-downscaling-single-user so I believe we can leave it up to them to make sure their image is ready for GPU-use

@yuvipanda
Copy link
Member

Yeah, I was going to do this in a separate PR as I think it's too complicated to do terraform things and JupyterHub things in one go. Updating the profile list is being tracked in #930

Makes sense!

This hub is using their own image: carbonplan/cmip6-downscaling-single-user so I believe we can leave it up to them to make sure their image is ready for GPU-use

Ah cool!

However, I looked up what would need to be done, and it isn't just what needs to be done at the user level - see https://docs.microsoft.com/en-us/azure/aks/gpu-cluster. Either we need to get Azure to use a different base image for GPU, or setup an additional daemonset to install the driver on the node. This is because the nvidia driver isn't actually open source. This is unfortunately true on almost all kubernetes providers.

@yuvipanda
Copy link
Member

From hashicorp/terraform-provider-azurerm#6793 it looks like the custom base image option might not be available to us, and we'd need to deploy the daemonset. Maybe it can be part of the support chart and enabled with a flag?

Doesn't need to be part of this PR tho!

@sgibson91
Copy link
Member Author

sgibson91 commented Jan 17, 2022

@yuvipanda This was the same conclusion I was coming to as well!

I'm just going to push a few more small changes to this PR that will allow us to apply labels to specific nodepools, like we can with the GKE terraform config, but also taints too. So I can deploy this with the recommended sku=gpu:NoSchedule taint from the Azure docs you linked.

@sgibson91
Copy link
Member Author

@yuvipanda Pushed those updates and added new tf plan output to top comment. LMK what you think! \o/

Copy link
Member

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sgibson91
Copy link
Member Author

Ran into this error on apply, the main cluster may need upgrading

│ Error:
│ The Kubernetes/Orchestrator Version "1.20.7" is not available for Node Pool "nbgpu".

│ Please confirm that this version is supported by the Kubernetes Cluster "hub-cluster"
│ (Resource Group "2i2c-carbonplan-cluster") - which may need to be upgraded first.

│ The Kubernetes Cluster is running version "1.20.7".

│ The supported Orchestrator Versions for this Node Pool/supported by this Kubernetes Cluster are:
│ * 1.19.11
│ * 1.19.13

│ Node Pools cannot use a version of Kubernetes that is not supported on the Control Plane. More
│ details can be found at https://aka.ms/version-skew-policy.


│ with azurerm_kubernetes_cluster_node_pool.user_pool["gpu"],
│ on main.tf line 117, in resource "azurerm_kubernetes_cluster_node_pool" "user_pool":
│ 117: resource "azurerm_kubernetes_cluster_node_pool" "user_pool" {

@sgibson91
Copy link
Member Author

Commit 47cf790 fixed the above error by specifying a specific kubernetes version for a nodepool

@sgibson91
Copy link
Member Author

This has been successfully applied to the cluster! 🎉 🚀

@sgibson91 sgibson91 merged commit c7bbde9 into 2i2c-org:master Jan 17, 2022
@sgibson91 sgibson91 deleted the carbonplan-gpu-nodes branch January 17, 2022 14:13
@choldgraf
Copy link
Member

Thanks @sgibson91 and @yuvipanda for documenting all of these steps as well :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants