Karpenter doesn't completely drain nodes that receive a spot interruption warning when running pods with `do-not-evict` annotation #3383

mrparkers · 2023-02-10T20:28:28Z

Version

Karpenter Version: v0.23.0

Kubernetes Version: version.Info{Major:"1", Minor:"23+", GitVersion:"v1.23.14-eks-ffeb93d", GitCommit:"96e7d52c98a32f2b296ca7f19dc9346cf79915ba", GitTreeState:"clean", BuildDate:"2022-11-29T18:43:31Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}

Expected Behavior

If a node managed by Karpenter receives a spot interruption warning from AWS, it should completely drain the node of all non-daemonset pods, even if they have the karpenter.sh/do-not-evict annotation.

The behavior that I expect is also described in the documentation (emphasis mine):

By opting pods out of eviction, you are telling Karpenter that it should not voluntarily remove nodes containing this pod.
Voluntary node removal does not include Interruption, which is considered an involuntary event, since node removal cannot be delayed.

Actual Behavior

I've observed that Karpenter refuses to completely drain a node that receives a spot interruption warning (more details in logs).

Steps to Reproduce the Problem

Schedule a workload with the do-not-evict annotation on a karpenter-managed spot instance.
Wait for AWS to issue a spot interruption warning.
Observe that karpenter won't drain the node

Resource Specs and Logs

Workload names have been changed.

Node events:

$ kubectl describe node ip-10-0-157-214.us-west-2.compute.internal
Events:
  Type     Reason                         Age                    From                             Message
  ----     ------                         ----                   ----                             -------
  Normal   RegisteredNode                 50m                    node-controller                  Node ip-10-0-157-214.us-west-2.compute.internal event: Registered Node ip-10-0-157-214.us-west-2.compute.internal in Controller
  Warning  InstanceSpotInterrupted        41m                    karpenter                        Node ip-10-0-157-214.us-west-2.compute.internal event: A spot interruption warning was triggered for the node
  Normal   NodeNotSchedulable             41m                    kubelet                          Node ip-10-0-157-214.us-west-2.compute.internal status is now: NodeNotSchedulable
  Warning  NodeTerminatingOnInterruption  39m (x2 over 41m)      karpenter                        Node ip-10-0-157-214.us-west-2.compute.internal event: Interruption triggered termination for the node
  Warning  InstanceTerminating            39m                    karpenter                        Node ip-10-0-157-214.us-west-2.compute.internal event: Instance is terminating
  Normal   NodeNotReady                   38m                    node-controller                  Node ip-10-0-157-214.us-west-2.compute.internal status is now: NodeNotReady
  Warning  FailedDraining                 37m (x3 over 41m)      karpenter                        Failed to drain node, pod ns-a/hello-world-75zsd-262l7 has do-not-evict annotation
  Warning  FailedDraining                 35m                    karpenter                        Failed to drain node, pod ns-b/foobar-77b745b647-5lk6b has do-not-evict annotation
  Warning  FailedInflightCheck            34m                    karpenter                        Can't drain node, pod ns-c/baz-5869d4ccfc-dh8n7 has do not evict annotation
  Warning  FailedDraining                 33m                    karpenter                        Failed to drain node, pod ns-c/baz-5869d4ccfc-dh8n7 has do-not-evict annotation
  Normal   DeletingNode                   3m26s (x413 over 38m)  cloud-node-lifecycle-controller  Deleting node ip-10-0-157-214.us-west-2.compute.internal because it does not exist in the cloud provider
  Warning  FailedDraining                 96s (x16 over 31m)     karpenter                        Failed to drain node, 1 pods are waiting to be evicted

karpenter logs:

$ stern -n karpenter karpenter | grep ip-10-0-157-214.us-west-2.compute.internal
karpenter-657d546885-6knhb controller {"level":"DEBUG","time":"2023-02-10T18:45:59.819Z","logger":"controller.interruption","message":"removing offering from offerings","commit":"5a7faa0-dirty","queue":"karpenter-interrupts","messageKind":"SpotInterruptionKind","node":"ip-10-0-157-214.us-west-2.compute.internal","action":"CordonAndDrain","unavailable-reason":"SpotInterruptionKind","instance-type":"m5zn.6xlarge","zone":"us-west-2c","capacity-type":"spot","unavailable-offerings-ttl":"3m0s"}
karpenter-657d546885-6knhb controller {"level":"INFO","time":"2023-02-10T18:45:59.844Z","logger":"controller.interruption","message":"deleted node from interruption message","commit":"5a7faa0-dirty","queue":"karpenter-interrupts","messageKind":"SpotInterruptionKind","node":"ip-10-0-157-214.us-west-2.compute.internal","action":"CordonAndDrain"}
karpenter-657d546885-6knhb controller {"level":"INFO","time":"2023-02-10T18:45:59.880Z","logger":"controller.termination","message":"cordoned node","commit":"5a7faa0-dirty","node":"ip-10-0-157-214.us-west-2.compute.internal"}
karpenter-657d546885-6knhb controller {"level":"INFO","time":"2023-02-10T18:48:02.217Z","logger":"controller.interruption","message":"deleted node from interruption message","commit":"5a7faa0-dirty","queue":"karpenter-interrupts","messageKind":"StateChangeKind","node":"ip-10-0-157-214.us-west-2.compute.internal","action":"CordonAndDrain"}
karpenter-657d546885-6knhb controller {"level":"INFO","time":"2023-02-10T18:49:10.169Z","logger":"controller.interruption","message":"deleted node from interruption message","commit":"5a7faa0-dirty","queue":"karpenter-interrupts","messageKind":"StateChangeKind","node":"ip-10-0-157-214.us-west-2.compute.internal","action":"CordonAndDrain"}
karpenter-657d546885-6knhb controller {"level":"INFO","time":"2023-02-10T18:52:44.784Z","logger":"controller.inflightchecks","message":"Inflight check failed for node, Can't drain node, pod ns-c/baz-5869d4ccfc-dh8n7 has do not evict annotation","commit":"5a7faa0-dirty","node":"ip-10-0-157-214.us-west-2.compute.internal"}
karpenter-657d546885-6knhb controller {"level":"INFO","time":"2023-02-10T20:14:54.449Z","logger":"controller.termination","message":"deleted node","commit":"5a7faa0-dirty","node":"ip-10-0-157-214.us-west-2.compute.internal"}

I don't believe that provisioner specs or pod specs are required to debug this, but I can provide them if necessary.

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

bwagner5 · 2023-02-11T00:52:27Z

Thanks for raising this issue! We'll look into a reproduction and fix soon!

runningman84 · 2023-02-11T18:39:06Z

This is similar to me bug report
#3214

jonathan-innis · 2023-02-14T18:30:44Z

@runningman84 @mrparkers I think our thought here is that if there is an interruption or some action that is going to cause node deletion that can't be stopped, we should ignore the do-not-evict annotation and just proceed cordon/drain/delete the node. This would include the manual deletion flow for the node since we can't differentiate between an automated deletion and a manual deletion of the node. WDYT?

ellistarn · 2023-02-14T18:52:24Z

@jonathan-innis , does this mean that do-not-evict is only respected in the deprovisioning flow, and not the deletion flow? Does this change the semantic of kubectl delete node to be more of a force delete? This is a breaking change, but I can see it providing more flexibility. Current users will need to start doing kubectl label node karpenter.sh/voluntary-disruption: my-reason

jonathan-innis · 2023-02-14T19:57:11Z

It would be breaking but that may be the intended semantic to begin with. I'm not sure there's much rationale to be kubectl delete node if you are intending to be saved by the do-not-evict annotation. Maybe automation might be relying on this but the case still seems a bit weak to me.

mrparkers · 2023-02-14T20:07:38Z

I think our thought here is that if there is an interruption or some action that is going to cause node deletion that can't be stopped, we should ignore the do-not-evict annotation and just proceed cordon/drain/delete the node.

Yes, I definitely agree with this, and this is exactly the behavior that I was expecting with regards to node deletions that can't be stopped.

This would include the manual deletion flow for the node since we can't differentiate between an automated deletion and a manual deletion of the node.

I think this is fine - as an administrator, if I'm running kubectl delete node, I think I would expect that node to be deleted regardless of what happens to be running on it.

I'm not sure there's much rationale to be kubectl delete node if you are intending to be saved by the do-not-evict annotation.

I agree with this too.

runningman84 · 2023-02-14T20:27:23Z

@runningman84 @mrparkers I think our thought here is that if there is an interruption or some action that is going to cause node deletion that can't be stopped, we should ignore the do-not-evict annotation and just proceed cordon/drain/delete the node. This would include the manual deletion flow for the node since we can't differentiate between an automated deletion and a manual deletion of the node. WDYT?

Yes that’s my point

ellistarn · 2023-02-14T22:32:25Z

mrparkers added the bug Something isn't working label Feb 10, 2023

bwagner5 added the burning Time sensitive issues label Feb 13, 2023

jonathan-innis self-assigned this Feb 21, 2023

jonathan-innis mentioned this issue Feb 21, 2023

fix!: Prevent eviction hanging due to do-not-evict kubernetes-sigs/karpenter#220

Merged

jonathan-innis closed this as completed in kubernetes-sigs/karpenter#220 Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Karpenter doesn't completely drain nodes that receive a spot interruption warning when running pods with `do-not-evict` annotation #3383

Karpenter doesn't completely drain nodes that receive a spot interruption warning when running pods with `do-not-evict` annotation #3383

mrparkers commented Feb 10, 2023

bwagner5 commented Feb 11, 2023

runningman84 commented Feb 11, 2023

jonathan-innis commented Feb 14, 2023

ellistarn commented Feb 14, 2023 •

edited

Loading

jonathan-innis commented Feb 14, 2023

mrparkers commented Feb 14, 2023

runningman84 commented Feb 14, 2023

ellistarn commented Feb 14, 2023

Karpenter doesn't completely drain nodes that receive a spot interruption warning when running pods with do-not-evict annotation #3383

Karpenter doesn't completely drain nodes that receive a spot interruption warning when running pods with do-not-evict annotation #3383

Comments

mrparkers commented Feb 10, 2023

Version

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Resource Specs and Logs

Community Note

bwagner5 commented Feb 11, 2023

runningman84 commented Feb 11, 2023

jonathan-innis commented Feb 14, 2023

ellistarn commented Feb 14, 2023 • edited Loading

jonathan-innis commented Feb 14, 2023

mrparkers commented Feb 14, 2023

runningman84 commented Feb 14, 2023

ellistarn commented Feb 14, 2023

Karpenter doesn't completely drain nodes that receive a spot interruption warning when running pods with `do-not-evict` annotation #3383

Karpenter doesn't completely drain nodes that receive a spot interruption warning when running pods with `do-not-evict` annotation #3383

ellistarn commented Feb 14, 2023 •

edited

Loading