Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent pod distrbution when using minDomains of Topology Spread Constraints when testing between 3 AZs and 2 AZs #7585

Open
vchintal opened this issue Jan 13, 2025 · 3 comments
Labels
bug Something isn't working needs-triage Issues that need to be triaged

Comments

@vchintal
Copy link

Description

Observed Behavior:

When the Karpenter CR is limited to two zones (us-west-2a and us-west-2b), an application deployed with Topology Spread Constraints with maxSkew: 1 and minDomains: 1 is resulting in a distribution as show below:

AZ            Instance Type          Host                                          # of app pods                                                 
us-west-2b    t3.small               ip-10-0-29-165.us-west-2.compute.internal     1
us-west-2a    t3.small               ip-10-0-6-0.us-west-2.compute.internal        1

Expected Behavior:

AZ            Instance Type          Host                                          # of app pods                                                 
us-west-2b    t3.small               ip-10-0-18-102.us-west-2.compute.internal     5
us-west-2a    t3.small               ip-10-0-8-14.us-west-2.compute.internal       5

Few things to note:

  1. When the Karpenter CR has all three Availability Zones (us-west-2a,us-west-2b, us-west-2c) then the same settings on the application's Deployment work just fine
  2. Also, ironically enough, this result (show under expected behavior with Karpenter CR having two Availability Zones) can be achieved by commenting out minDomains and setting whenUnsatisfiable: ScheduleAnyway

Reproduction Steps (Please include YAML):

Karpenter CR

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: "topology.kubernetes.io/zone"
          operator: In
          values: ["us-west-2a", "us-west-2b"]    
        - key: "karpenter.k8s.aws/instance-hypervisor"
          operator: In
          values: ["nitro"]
        - key: "karpenter.k8s.aws/instance-generation"
          operator: Gt
          values: ["2"]
  limits:
    cpu: 1000
  disruption:    
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 5s

Inflate Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 10
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      topologySpreadConstraints:
      - maxSkew: 1
        minDomains: 1
        whenUnsatisfiable: DoNotSchedule
        topologyKey: topology.kubernetes.io/zone
        labelSelector:
          matchLabels:
            app: inflate
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: "1m"
            limits:
              cpu: "1"
              memory: "250Mi"

Versions:

  • Chart Version: 1.1.1
  • Kubernetes Version (kubectl version): v1.31.3
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@vchintal vchintal added bug Something isn't working needs-triage Issues that need to be triaged labels Jan 13, 2025
@tzneal
Copy link
Contributor

tzneal commented Jan 21, 2025

I suspect your NodePools are discovering three different AZs via subnet selectors. Due to this, Karpenter is aware of three but knows that you've restricted your NodePool to only two. Since it can't launch another node in the AZ that its aware of, but has no NodePool for, it doesn't.

To make Karpenter unaware of the third AZ, you'll need to update your subnet seletor, or tags on the subnets so that Karpenter doesn't discover it.

@vchintal
Copy link
Author

Still facing the same issue. Emitted events show the following:

96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-967d6               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-bkl5l               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-7cczg               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-hdslj               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-5rwvn               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-v4dvc               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-hqmqb               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-kbt9h               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.

This is the contents of karpenter.yaml

---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiSelectorTerms:
    - alias: bottlerocket@latest
  role: karpenter
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: karpenter
        allowed: "true"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: karpenter
  tags:
    karpenter.sh/discovery: karpenter
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: "topology.kubernetes.io/zone"
          operator: In
          values: ["us-east-1a", "us-east-1b"]
        - key: "karpenter.k8s.aws/instance-category"
          operator: In
          values: ["c", "m", "r"]
        - key: "karpenter.k8s.aws/instance-cpu"
          operator: In
          values: ["4", "8", "16", "32"]
        - key: "karpenter.k8s.aws/instance-hypervisor"
          operator: In
          values: ["nitro"]
        - key: "karpenter.k8s.aws/instance-generation"
          operator: Gt
          values: ["2"]
  limits:
    cpu: 1000
  disruption:    
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 5s
---

Output of the command: aws ec2 describe-subnets --filters='Name=tag:allowed,Values=true' --region us-east-1 | jq '.Subnets[].Tags'

[
  {
    "Key": "allowed",
    "Value": "true"
  },
  {
    "Key": "Blueprint",
    "Value": "karpenter"
  },
  {
    "Key": "Name",
    "Value": "karpenter-private-us-east-1b"
  },
  {
    "Key": "karpenter.sh/discovery",
    "Value": "karpenter"
  },
  {
    "Key": "GithubRepo",
    "Value": "github.com/aws-ia/terraform-aws-eks-blueprints"
  },
  {
    "Key": "kubernetes.io/role/internal-elb",
    "Value": "1"
  }
]
[
  {
    "Key": "Blueprint",
    "Value": "karpenter"
  },
  {
    "Key": "GithubRepo",
    "Value": "github.com/aws-ia/terraform-aws-eks-blueprints"
  },
  {
    "Key": "Name",
    "Value": "karpenter-private-us-east-1a"
  },
  {
    "Key": "allowed",
    "Value": "true"
  },
  {
    "Key": "kubernetes.io/role/internal-elb",
    "Value": "1"
  },
  {
    "Key": "karpenter.sh/discovery",
    "Value": "karpenter"
  }
]

@vchintal
Copy link
Author

I suspect your NodePools are discovering three different AZs via subnet selectors. Due to this, Karpenter is aware of three but knows that you've restricted your NodePool to only two. Since it can't launch another node in the AZ that its aware of, but has no NodePool for, it doesn't.

Wouldn't that be incorrect behavior then, as I am explicitly telling Karpenter via NodePool CR to not include anything (which essentially means subnet here) related to the third AZ? Or other way around, I am explicitly asking Karpenter to only include the provided AZs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage Issues that need to be triaged
Projects
None yet
Development

No branches or pull requests

2 participants