Handling Eviction Requests = # Preface The [Kubernetes](https://kubernetes.io/) API supports [API-initiated Eviction](https://kubernetes.io/docs/concepts/scheduling-eviction/api-eviction/) which allows to programmatically evict pods. The API is used for example by: - [kubectl drain](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) - [Descheduler](https://github.com/kubernetes-sigs/descheduler) Since KubeVirt virtual machines are running inside `virt-launcher` pods - they are affected by Kubernetes' eviction mechanism. This requires special handling on KubeVirt's side, since virtual machines eviction is a bit more complex than the average pod. `Evacuation` is the term used by KubeVirt to describe the migration of a VMI as the result of `virt-launcher` pod eviction. This document will describe how KubeVirt currently handles eviction requests. # Eviction Strategies A VirtualMachineInstance can have one of four Eviction Strategies. The eviction strategy is defined in the VMI spec, with a fallback to a cluster-wide definition in the KubeVirt CustomResource. The eviction strategy affects the way the VirtualMachineInstance will be evacuated: | Eviction Strategy | Meaning | |-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | None | No action will be taken, according to the specified 'RunStrategy' the VirtualMachine will be restarted or shutdown | | LiveMigrate | The VirtualMachine will be migrated instead of being shutdown | | LiveMigrateIfPossible | Same as `LiveMigrate` but only if the VirtualMachine is Live-Migratable, otherwise it will behave as `None` | | External | The VirtualMachine will be protected by a PDB and vmi.Status.EvacuationNodeName will be set on eviction. This is mainly useful for cluster-api-provider-kubevirt (capk) which needs a way for VMI’s to be blocked from eviction, yet signal capk that eviction has been called on the VMI so the capk controller can handle tearing the VMI down | # Pod Eviction Webhook `virt-api` serves a [validating webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) which intercepts **all** eviction requests in the cluster: ```shell kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io virt-api-validator -o yaml ``` The purpose of this webhook is to trigger VMI evacuation in cases where it is required. The way the webhook triggers the evacuation is by setting the VMI's `Status.EvacuationNodeName` field to the node name it is currently running on, so the [evacuation controller](#evacuation-controller) will know it needs to migrate it to another node. The webhook has the ability to: 1. Approve the request - so it could be further processed 2. Deny the request - the request will be declined without additional processing The webhook admits eviction requests **before** `kube-api` checks them against [Pod Distribution Budget](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#pod-disruption-budgets) objects. In case the pod is not a `virt-launcher` pod - the eviction request is approved. Otherwise, depending on the VMI's eviction strategy and whether it is migratable - the webhook will potentially mark the VMI for evacuation and approve or deny the eviction request: | Eviction Strategy | Is VMI migratable | Is VMI marked for evacuation | Does Webhook approve eviction | Webhook Response | |-----------------------|-------------------|------------------------------|-------------------------------|--------------------------------------------------| | None | True/False | False | True | 200 - Eviction granted | | LiveMigrate | True | True | False | 429 - Eviction denied (evacuation was triggered) | | LiveMigrate | False | False | False | 429 - Eviction denied | | LiveMigrateIfPossible | True | True | False | 429 - Eviction denied (evacuation was triggered) | | LiveMigrateIfPossible | False | False | True | 200 - Eviction granted | | External | True/False | True | False | 429 - Eviction denied (evacuation was triggered) | The webhook will approve additional eviction requests on a virt-launcher pod owned by a VMI which had previously been marked for evacuation: | Eviction Strategy | Is VMI migratable | Does Webhook approve eviction | Webhook Response | |-----------------------|-------------------|-------------------------------|------------------------| | LiveMigrate | True | True | 200 - Eviction granted | | LiveMigrateIfPossible | True | True | 200 - Eviction granted | | External | True/False | True | 200 - Eviction granted | In these cases, a PDB will protect the virt-launcher pod (see explanation bellow). > **Note** > Since the webhook intercepts all eviction requests in the cluster, it is configured to be [ignored](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#failure-policy) in case kube-api fails to get a response from it. > Ignored in this context means that the eviction request is considered to be approved by the webhook, and will be further checked against the PodDistributionBudget. > Some virt-launcher pods should be protected from eviction even if the webhook fails, this is the reason PodDistributionBudget objects are required (described in the next section). > In case the webhook is down, the virt-launcher pod will be protected from eviction by the PDB (if required), but the evacuation will **not** be triggered. # Pod Distribution Budget In case the `Pod Eviction Webhook` approved the eviction, kube-api checks whether a PDB protects the `virt-launcher` pod. If there is a PDB protecting the `virt-launcher` pod - the eviction request is denied, otherwise it is approved and the pod is evicted. In order for the evacuation of VMIs to happen in a controlled manner, KubeVirt protects part of the `virt-launcher` pods with a PDB which blocks eviction requests. `virt-controller` has a `Disruption Budget Controller` which decides whether a `virt-launcher` pod should be protected based on the eviction strategy of its controlling VMI: | Eviction Strategy | Is a PDB required | |-----------------------|-------------------------------| | None | False | | LiveMigrate | True | | LiveMigrateIfPossible | Only if the VMI is migratable | | External | True | > **Note** > During a migration, the PDB that protects the source virt-launcher pod is expended by the migration controller to also protect the target pod. # Eviction Approval Summary The eviction request's initiator will observe one of the following responses: | Eviction Strategy | Is VMI migratable | Is VMI marked for evacuation | Does Webhook approve eviction | Does PDB allow eviction | Final Response | |-----------------------|-------------------|------------------------------|-------------------------------|-------------------------|--------------------------------------------------| | None | True/False | False | True | True | 200 - Eviction granted | | LiveMigrate | True | True | False | False | 429 - Eviction denied (evacuation was triggered) | | LiveMigrate | False | False | False | False | 429 - Eviction denied by webhook | | LiveMigrateIfPossible | True | True | False | False | 429 - Eviction denied (evacuation was triggered) | | LiveMigrateIfPossible | False | False | True | True | 200 - Eviction granted | | External | True/False | True | False | False | 429 - Eviction denied (evacuation was triggered) | For additional requests on virt-launcher pods owned by a VMI which had previously been marked for evacuation: | Eviction Strategy | Is VMI migratable | Does Webhook approve eviction | Does PDB allow eviction | Final Response | |-----------------------|-------------------|-------------------------------|-------------------------|-------------------------------| | LiveMigrate | True | True | False | 429 - Eviction blocked by PDB | | LiveMigrateIfPossible | True | True | False | 429 - Eviction blocked by PDB | | External | True/False | True | False | 429 - Eviction blocked by PDB | To summarize: 1. The eviction request is granted only if both the webhook and the PDB allow them. 2. When the eviction request's initiator gets a 429 response, they can check the (first) response message whether the VMI will be evacuated. ## Example kubectl drain Output The following output depicts the eviction of a `virt-launcher` pod owned by a migratable VMI (with the `LiveMigrate` eviction strategy): ```shell $ kubectl drain node01 --ignore-daemonsets --delete-emptydir-data ... evicting pod default/virt-launcher-vm-cirros-wn5v4 error when evicting pods/"virt-launcher-vm-cirros-wn5v4" -n "default" (will retry after 5s): admission webhook "virt-launcher-eviction-interceptor.kubevirt.io" denied the request: Eviction triggered evacuation of VMI "default/vm-cirros" ... evicting pod default/virt-launcher-vm-cirros-wn5v4 error when evicting pods/"virt-launcher-vm-cirros-wn5v4" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. ... evicting pod default/virt-launcher-vm-cirros-wn5v4 pod/virt-launcher-vm-cirros-wn5v4 evicted node/node01 drained ``` # Evacuation Controller `virt-controller` has an evacuation controller which looks for potential VMIs to evict and tries to migrate them to another node.