Inconsistent pod distrbution when using `minDomains` of Topology Spread Constraints when testing between 3 AZs and 2 AZs #7585

vchintal · 2025-01-13T07:03:15Z

Description

Observed Behavior:

When the Karpenter CR is limited to two zones (us-west-2a and us-west-2b), an application deployed with Topology Spread Constraints with maxSkew: 1 and minDomains: 1 is resulting in a distribution as show below:

AZ            Instance Type          Host                                          # of app pods                                                 
us-west-2b    t3.small               ip-10-0-29-165.us-west-2.compute.internal     1
us-west-2a    t3.small               ip-10-0-6-0.us-west-2.compute.internal        1

Expected Behavior:

AZ            Instance Type          Host                                          # of app pods                                                 
us-west-2b    t3.small               ip-10-0-18-102.us-west-2.compute.internal     5
us-west-2a    t3.small               ip-10-0-8-14.us-west-2.compute.internal       5

Few things to note:

When the Karpenter CR has all three Availability Zones (us-west-2a,us-west-2b, us-west-2c) then the same settings on the application's Deployment work just fine
Also, ironically enough, this result (show under expected behavior with Karpenter CR having two Availability Zones) can be achieved by commenting out minDomains and setting whenUnsatisfiable: ScheduleAnyway

Reproduction Steps (Please include YAML):

Karpenter CR

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: "topology.kubernetes.io/zone"
          operator: In
          values: ["us-west-2a", "us-west-2b"]    
        - key: "karpenter.k8s.aws/instance-hypervisor"
          operator: In
          values: ["nitro"]
        - key: "karpenter.k8s.aws/instance-generation"
          operator: Gt
          values: ["2"]
  limits:
    cpu: 1000
  disruption:    
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 5s

Inflate Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 10
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      topologySpreadConstraints:
      - maxSkew: 1
        minDomains: 1
        whenUnsatisfiable: DoNotSchedule
        topologyKey: topology.kubernetes.io/zone
        labelSelector:
          matchLabels:
            app: inflate
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: "1m"
            limits:
              cpu: "1"
              memory: "250Mi"

Versions:

Chart Version: 1.1.1
Kubernetes Version (kubectl version): v1.31.3

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

tzneal · 2025-01-21T22:13:21Z

I suspect your NodePools are discovering three different AZs via subnet selectors. Due to this, Karpenter is aware of three but knows that you've restricted your NodePool to only two. Since it can't launch another node in the AZ that its aware of, but has no NodePool for, it doesn't.

To make Karpenter unaware of the third AZ, you'll need to update your subnet seletor, or tags on the subnets so that Karpenter doesn't discover it.

vchintal · 2025-01-21T23:07:06Z

Still facing the same issue. Emitted events show the following:

96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-967d6               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-bkl5l               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-7cczg               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-hdslj               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-5rwvn               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-v4dvc               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-hqmqb               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
96s         Warning   FailedScheduling          pod/inflate-57d58f4fdd-kbt9h               0/4 nodes are available: 2 node(s) didn't match pod topology spread constraints, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}. preemption: 0/4 nodes are available: 2 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.

This is the contents of karpenter.yaml

---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiSelectorTerms:
    - alias: bottlerocket@latest
  role: karpenter
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: karpenter
        allowed: "true"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: karpenter
  tags:
    karpenter.sh/discovery: karpenter
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: "topology.kubernetes.io/zone"
          operator: In
          values: ["us-east-1a", "us-east-1b"]
        - key: "karpenter.k8s.aws/instance-category"
          operator: In
          values: ["c", "m", "r"]
        - key: "karpenter.k8s.aws/instance-cpu"
          operator: In
          values: ["4", "8", "16", "32"]
        - key: "karpenter.k8s.aws/instance-hypervisor"
          operator: In
          values: ["nitro"]
        - key: "karpenter.k8s.aws/instance-generation"
          operator: Gt
          values: ["2"]
  limits:
    cpu: 1000
  disruption:    
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 5s
---

Output of the command: aws ec2 describe-subnets --filters='Name=tag:allowed,Values=true' --region us-east-1 | jq '.Subnets[].Tags'

[
  {
    "Key": "allowed",
    "Value": "true"
  },
  {
    "Key": "Blueprint",
    "Value": "karpenter"
  },
  {
    "Key": "Name",
    "Value": "karpenter-private-us-east-1b"
  },
  {
    "Key": "karpenter.sh/discovery",
    "Value": "karpenter"
  },
  {
    "Key": "GithubRepo",
    "Value": "github.com/aws-ia/terraform-aws-eks-blueprints"
  },
  {
    "Key": "kubernetes.io/role/internal-elb",
    "Value": "1"
  }
]
[
  {
    "Key": "Blueprint",
    "Value": "karpenter"
  },
  {
    "Key": "GithubRepo",
    "Value": "github.com/aws-ia/terraform-aws-eks-blueprints"
  },
  {
    "Key": "Name",
    "Value": "karpenter-private-us-east-1a"
  },
  {
    "Key": "allowed",
    "Value": "true"
  },
  {
    "Key": "kubernetes.io/role/internal-elb",
    "Value": "1"
  },
  {
    "Key": "karpenter.sh/discovery",
    "Value": "karpenter"
  }
]

vchintal · 2025-01-21T23:10:55Z

I suspect your NodePools are discovering three different AZs via subnet selectors. Due to this, Karpenter is aware of three but knows that you've restricted your NodePool to only two. Since it can't launch another node in the AZ that its aware of, but has no NodePool for, it doesn't.

Wouldn't that be incorrect behavior then, as I am explicitly telling Karpenter via NodePool CR to not include anything (which essentially means subnet here) related to the third AZ? Or other way around, I am explicitly asking Karpenter to only include the provided AZs?

vchintal added bug Something isn't working needs-triage Issues that need to be triaged labels Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent pod distrbution when using `minDomains` of Topology Spread Constraints when testing between 3 AZs and 2 AZs #7585

Inconsistent pod distrbution when using `minDomains` of Topology Spread Constraints when testing between 3 AZs and 2 AZs #7585

vchintal commented Jan 13, 2025

tzneal commented Jan 21, 2025 •

edited

Loading

vchintal commented Jan 21, 2025

vchintal commented Jan 21, 2025

Inconsistent pod distrbution when using minDomains of Topology Spread Constraints when testing between 3 AZs and 2 AZs #7585

Inconsistent pod distrbution when using minDomains of Topology Spread Constraints when testing between 3 AZs and 2 AZs #7585

Comments

vchintal commented Jan 13, 2025

Description

tzneal commented Jan 21, 2025 • edited Loading

vchintal commented Jan 21, 2025

vchintal commented Jan 21, 2025

Inconsistent pod distrbution when using `minDomains` of Topology Spread Constraints when testing between 3 AZs and 2 AZs #7585

Inconsistent pod distrbution when using `minDomains` of Topology Spread Constraints when testing between 3 AZs and 2 AZs #7585

tzneal commented Jan 21, 2025 •

edited

Loading