Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drifted node does not get replaced #3558

Closed
runningman84 opened this issue Mar 8, 2023 · 4 comments
Closed

Drifted node does not get replaced #3558

runningman84 opened this issue Mar 8, 2023 · 4 comments
Labels
question Issues that are support related questions

Comments

@runningman84
Copy link

Version

Karpenter Version: v0.24.0

Kubernetes Version: v1.24.10

Expected Behavior

A drifted node should be removed timely.

Actual Behavior

One old node (using k8s version 1.23 is still running). It should be removed due too beeing running for more than 24 hours and for beeing drifted...

Name:               ip-10-8-186-29.eu-central-1.compute.internal                                                                                                                                                                    
Roles:              <none>                                                                                                                                                                                                          
Labels:             beta.kubernetes.io/arch=amd64                                                                                                                                                                                   
                    beta.kubernetes.io/instance-type=t3.medium                                                                                                                                                                      
                    beta.kubernetes.io/os=linux                                                                                                                                                                                     
                    failure-domain.beta.kubernetes.io/region=eu-central-1                                                                                                                                                           
                    failure-domain.beta.kubernetes.io/zone=eu-central-1c                                                                                                                                                            
                    k8s.io/cloud-provider-aws=ea9740b93177deeb0ab81f8466763d25                                                                                                                                                      
                    karpenter.k8s.aws/instance-ami-id=ami-02cc2ac407abdf7b2                                                                                                                                                         
                    karpenter.k8s.aws/instance-category=t                                                                                                                                                                           
                    karpenter.k8s.aws/instance-cpu=2                                                                                                                                                                                
                    karpenter.k8s.aws/instance-encryption-in-transit-supported=false                                                                                                                                                
                    karpenter.k8s.aws/instance-family=t3                                                                                                                                                                            
                    karpenter.k8s.aws/instance-generation=3                                                                                                                                                                         
                    karpenter.k8s.aws/instance-hypervisor=nitro                                                                                                                                                                     
                    karpenter.k8s.aws/instance-memory=4096                                                                                                                                                                          
                    karpenter.k8s.aws/instance-pods=17                                                                                                                                                                              
                    karpenter.k8s.aws/instance-size=medium                                                                                                                                                                          
                    karpenter.sh/capacity-type=spot                                                                                                                                                                                 
                    karpenter.sh/initialized=true                                                                                                                                                                                   
                    karpenter.sh/machine-name=                                                                                                                                                                                      
                    karpenter.sh/provisioner-name=cron                                                                                                                                                                              
                    kubernetes.io/arch=amd64                                                                                                                                                                                        
                    kubernetes.io/hostname=ip-10-8-186-29.eu-central-1.compute.internal                                                                                                                                             
                    kubernetes.io/os=linux                                                                                                                                                                                          
                    node.kubernetes.io/instance-type=t3.medium                                                                                                                                                                      
                    topology.ebs.csi.aws.com/zone=eu-central-1c                                                                                                                                                                     
                    topology.kubernetes.io/region=eu-central-1                                                                                                                                                                      
                    topology.kubernetes.io/zone=eu-central-1c                                                                                                                                                                       
Annotations:        alpha.kubernetes.io/provided-node-ip: 10.8.186.29                                                                                                                                                               
                    csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0a6a8173e49d0a520"}                                                                                                                                      
                    karpenter.sh/voluntary-disruption: drifted                                                                                                                                                                      
                    node.alpha.kubernetes.io/ttl: 0                                                                                                                                                                                 
                    volumes.kubernetes.io/controller-managed-attach-detach: true                                                                                                                                                    
CreationTimestamp:  Tue, 07 Mar 2023 07:00:07 +0100                                                                                                                                                                                 
Taints:             provisioner=cron:NoSchedule                                                                                                                                                                                     
Unschedulable:      false                                                                                                                                                                                                           
Lease:                                                                                                                                                                                                                              
  HolderIdentity:  ip-10-8-186-29.eu-central-1.compute.internal                                                                                                                                                                     
  AcquireTime:     <unset>                                                                                                                                                                                                          
  RenewTime:       Wed, 08 Mar 2023 10:23:40 +0100                                                                                                                                                                                  
Conditions:                                                                                                                                                                                                                         
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message                                                                                                 
  ----             ------  -----------------                 ------------------                ------                       -------                                                                                                 
  MemoryPressure   False   Wed, 08 Mar 2023 10:23:22 +0100   Tue, 07 Mar 2023 07:00:49 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available                                                                 
  DiskPressure     False   Wed, 08 Mar 2023 10:23:22 +0100   Tue, 07 Mar 2023 16:55:48 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure                                                                            
  PIDPressure      False   Wed, 08 Mar 2023 10:23:22 +0100   Tue, 07 Mar 2023 07:00:49 +0100   KubeletHasSufficientPID      kubelet has sufficient PID available                                                                    
  Ready            True    Wed, 08 Mar 2023 10:23:22 +0100   Tue, 07 Mar 2023 07:01:20 +0100   KubeletReady                 kubelet is posting ready status 

Steps to Reproduce the Problem

The cron provisioner looks like this:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  creationTimestamp: "2023-01-11T15:15:57Z"
  generation: 7
  labels:
    kustomize.toolkit.fluxcd.io/name: karpenter-custom-provisioner
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  name: cron
  resourceVersion: "675836310"
  uid: 59293143-53f4-44e8-bc16-b704c66aa61a
spec:
  limits:
    resources:
      cpu: "4"
  providerRef:
    name: default
  providerRef:
    name: default
  requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - on-demand
    - spot
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64
  - key: karpenter.k8s.aws/instance-category
    operator: In
    values:
    - c
    - m
    - r
    - t
  - key: karpenter.k8s.aws/instance-hypervisor
    operator: In
    values:
    - nitro
  - key: kubernetes.io/os
    operator: In
    values:
    - linux
  taints:
  - effect: NoSchedule
    key: provisioner
    value: cron
  ttlSecondsAfterEmpty: 900
  ttlSecondsUntilExpired: 86400
  weight: 10

Resource Specs and Logs

there are no specific logs for this instance

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@runningman84 runningman84 added the bug Something isn't working label Mar 8, 2023
@runningman84
Copy link
Author

Okay I have found the reason there was still some long running pod with a do not evict annotation which prevented it from beeing removed. A corresponding log entry / metric would help to discover this problem quickly.

@jonathan-innis
Copy link
Contributor

Can you try upgrading to the latest version of Karpenter? There should now be events that are fired over the node that indicate the reason why drift deprovisioning would be blocked. kubernetes-sigs/karpenter#224

@jonathan-innis jonathan-innis added question Issues that are support related questions and removed bug Something isn't working labels Mar 8, 2023
@jonathan-innis
Copy link
Contributor

@runningman84 Is this still an issue for you or is this being tracked in another referenced issue?

@runningman84
Copy link
Author

I think this issue is solved with the latest release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Issues that are support related questions
Projects
None yet
Development

No branches or pull requests

2 participants