Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter doesn't completely drain nodes that receive a spot interruption warning when running pods with do-not-evict annotation #3383

Closed
mrparkers opened this issue Feb 10, 2023 · 8 comments · Fixed by kubernetes-sigs/karpenter#220
Assignees
Labels
bug Something isn't working burning Time sensitive issues

Comments

@mrparkers
Copy link

Version

Karpenter Version: v0.23.0

Kubernetes Version: version.Info{Major:"1", Minor:"23+", GitVersion:"v1.23.14-eks-ffeb93d", GitCommit:"96e7d52c98a32f2b296ca7f19dc9346cf79915ba", GitTreeState:"clean", BuildDate:"2022-11-29T18:43:31Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}

Expected Behavior

If a node managed by Karpenter receives a spot interruption warning from AWS, it should completely drain the node of all non-daemonset pods, even if they have the karpenter.sh/do-not-evict annotation.

The behavior that I expect is also described in the documentation (emphasis mine):

By opting pods out of eviction, you are telling Karpenter that it should not voluntarily remove nodes containing this pod.
Voluntary node removal does not include Interruption, which is considered an involuntary event, since node removal cannot be delayed.

Actual Behavior

I've observed that Karpenter refuses to completely drain a node that receives a spot interruption warning (more details in logs).

Steps to Reproduce the Problem

  1. Schedule a workload with the do-not-evict annotation on a karpenter-managed spot instance.
  2. Wait for AWS to issue a spot interruption warning.
  3. Observe that karpenter won't drain the node

Resource Specs and Logs

Workload names have been changed.

Node events:

$ kubectl describe node ip-10-0-157-214.us-west-2.compute.internal
Events:
  Type     Reason                         Age                    From                             Message
  ----     ------                         ----                   ----                             -------
  Normal   RegisteredNode                 50m                    node-controller                  Node ip-10-0-157-214.us-west-2.compute.internal event: Registered Node ip-10-0-157-214.us-west-2.compute.internal in Controller
  Warning  InstanceSpotInterrupted        41m                    karpenter                        Node ip-10-0-157-214.us-west-2.compute.internal event: A spot interruption warning was triggered for the node
  Normal   NodeNotSchedulable             41m                    kubelet                          Node ip-10-0-157-214.us-west-2.compute.internal status is now: NodeNotSchedulable
  Warning  NodeTerminatingOnInterruption  39m (x2 over 41m)      karpenter                        Node ip-10-0-157-214.us-west-2.compute.internal event: Interruption triggered termination for the node
  Warning  InstanceTerminating            39m                    karpenter                        Node ip-10-0-157-214.us-west-2.compute.internal event: Instance is terminating
  Normal   NodeNotReady                   38m                    node-controller                  Node ip-10-0-157-214.us-west-2.compute.internal status is now: NodeNotReady
  Warning  FailedDraining                 37m (x3 over 41m)      karpenter                        Failed to drain node, pod ns-a/hello-world-75zsd-262l7 has do-not-evict annotation
  Warning  FailedDraining                 35m                    karpenter                        Failed to drain node, pod ns-b/foobar-77b745b647-5lk6b has do-not-evict annotation
  Warning  FailedInflightCheck            34m                    karpenter                        Can't drain node, pod ns-c/baz-5869d4ccfc-dh8n7 has do not evict annotation
  Warning  FailedDraining                 33m                    karpenter                        Failed to drain node, pod ns-c/baz-5869d4ccfc-dh8n7 has do-not-evict annotation
  Normal   DeletingNode                   3m26s (x413 over 38m)  cloud-node-lifecycle-controller  Deleting node ip-10-0-157-214.us-west-2.compute.internal because it does not exist in the cloud provider
  Warning  FailedDraining                 96s (x16 over 31m)     karpenter                        Failed to drain node, 1 pods are waiting to be evicted

karpenter logs:

$ stern -n karpenter karpenter | grep ip-10-0-157-214.us-west-2.compute.internal
karpenter-657d546885-6knhb controller {"level":"DEBUG","time":"2023-02-10T18:45:59.819Z","logger":"controller.interruption","message":"removing offering from offerings","commit":"5a7faa0-dirty","queue":"karpenter-interrupts","messageKind":"SpotInterruptionKind","node":"ip-10-0-157-214.us-west-2.compute.internal","action":"CordonAndDrain","unavailable-reason":"SpotInterruptionKind","instance-type":"m5zn.6xlarge","zone":"us-west-2c","capacity-type":"spot","unavailable-offerings-ttl":"3m0s"}
karpenter-657d546885-6knhb controller {"level":"INFO","time":"2023-02-10T18:45:59.844Z","logger":"controller.interruption","message":"deleted node from interruption message","commit":"5a7faa0-dirty","queue":"karpenter-interrupts","messageKind":"SpotInterruptionKind","node":"ip-10-0-157-214.us-west-2.compute.internal","action":"CordonAndDrain"}
karpenter-657d546885-6knhb controller {"level":"INFO","time":"2023-02-10T18:45:59.880Z","logger":"controller.termination","message":"cordoned node","commit":"5a7faa0-dirty","node":"ip-10-0-157-214.us-west-2.compute.internal"}
karpenter-657d546885-6knhb controller {"level":"INFO","time":"2023-02-10T18:48:02.217Z","logger":"controller.interruption","message":"deleted node from interruption message","commit":"5a7faa0-dirty","queue":"karpenter-interrupts","messageKind":"StateChangeKind","node":"ip-10-0-157-214.us-west-2.compute.internal","action":"CordonAndDrain"}
karpenter-657d546885-6knhb controller {"level":"INFO","time":"2023-02-10T18:49:10.169Z","logger":"controller.interruption","message":"deleted node from interruption message","commit":"5a7faa0-dirty","queue":"karpenter-interrupts","messageKind":"StateChangeKind","node":"ip-10-0-157-214.us-west-2.compute.internal","action":"CordonAndDrain"}
karpenter-657d546885-6knhb controller {"level":"INFO","time":"2023-02-10T18:52:44.784Z","logger":"controller.inflightchecks","message":"Inflight check failed for node, Can't drain node, pod ns-c/baz-5869d4ccfc-dh8n7 has do not evict annotation","commit":"5a7faa0-dirty","node":"ip-10-0-157-214.us-west-2.compute.internal"}
karpenter-657d546885-6knhb controller {"level":"INFO","time":"2023-02-10T20:14:54.449Z","logger":"controller.termination","message":"deleted node","commit":"5a7faa0-dirty","node":"ip-10-0-157-214.us-west-2.compute.internal"}

I don't believe that provisioner specs or pod specs are required to debug this, but I can provide them if necessary.

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@mrparkers mrparkers added the bug Something isn't working label Feb 10, 2023
@bwagner5
Copy link
Contributor

Thanks for raising this issue! We'll look into a reproduction and fix soon!

@runningman84
Copy link

This is similar to me bug report
#3214

@bwagner5 bwagner5 added the burning Time sensitive issues label Feb 13, 2023
@jonathan-innis
Copy link
Contributor

@runningman84 @mrparkers I think our thought here is that if there is an interruption or some action that is going to cause node deletion that can't be stopped, we should ignore the do-not-evict annotation and just proceed cordon/drain/delete the node. This would include the manual deletion flow for the node since we can't differentiate between an automated deletion and a manual deletion of the node. WDYT?

@ellistarn
Copy link
Contributor

ellistarn commented Feb 14, 2023

@jonathan-innis , does this mean that do-not-evict is only respected in the deprovisioning flow, and not the deletion flow? Does this change the semantic of kubectl delete node to be more of a force delete? This is a breaking change, but I can see it providing more flexibility. Current users will need to start doing kubectl label node karpenter.sh/voluntary-disruption: my-reason

@jonathan-innis
Copy link
Contributor

It would be breaking but that may be the intended semantic to begin with. I'm not sure there's much rationale to be kubectl delete node if you are intending to be saved by the do-not-evict annotation. Maybe automation might be relying on this but the case still seems a bit weak to me.

@mrparkers
Copy link
Author

I think our thought here is that if there is an interruption or some action that is going to cause node deletion that can't be stopped, we should ignore the do-not-evict annotation and just proceed cordon/drain/delete the node.

Yes, I definitely agree with this, and this is exactly the behavior that I was expecting with regards to node deletions that can't be stopped.

This would include the manual deletion flow for the node since we can't differentiate between an automated deletion and a manual deletion of the node.

I think this is fine - as an administrator, if I'm running kubectl delete node, I think I would expect that node to be deleted regardless of what happens to be running on it.

I'm not sure there's much rationale to be kubectl delete node if you are intending to be saved by the do-not-evict annotation.

I agree with this too.

@runningman84
Copy link

@runningman84 @mrparkers I think our thought here is that if there is an interruption or some action that is going to cause node deletion that can't be stopped, we should ignore the do-not-evict annotation and just proceed cordon/drain/delete the node. This would include the manual deletion flow for the node since we can't differentiate between an automated deletion and a manual deletion of the node. WDYT?

Yes that’s my point

@ellistarn
Copy link
Contributor

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working burning Time sensitive issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants