Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compatibility with aws autoscaler with nodegroup min=0 #1066

Closed
scottyhq opened this issue Jul 26, 2019 · 3 comments
Closed

compatibility with aws autoscaler with nodegroup min=0 #1066

scottyhq opened this issue Jul 26, 2019 · 3 comments
Labels
kind/feature New feature or request

Comments

@scottyhq
Copy link

scottyhq commented Jul 26, 2019

Why do you want this feature?
Currently, eksctl examples using the aws kubernetes autoscaler work when at least 1 node is always running. But we'd like to save on costs by scaling from 0 nodes. There are a few extra settings required for this:
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws#scaling-a-node-group-to-0

What feature/behavior/change do you want?
The current workaround is to manually add node labels as tags in ASGs. For example in this node configuration

  - name: dask-worker
    instanceType: r5.2xlarge
    minSize: 0
    maxSize: 100
    volumeSize: 100
    volumeType: gp2
    labels:
      node-role.kubernetes.io/worker: worker
      k8s.dask.org/node-purpose: worker
    taints:
      k8s.dask.org/dedicated: 'worker:NoSchedule'
    desiredCapacity: 0
    ami: auto
    amiFamily: AmazonLinux2
    iam:
      withAddonPolicies:
        autoScaler: true
        efs: true

We currently have to manually add the following tags to the corresponding ASG:

k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose     worker
k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated     worker:NoSchedule

Perhaps a flag could be added to propagate labels in the config file to ASG tags when running eksctl create nodegroups ?

related:
#1012 (comment)
#170

@scottyhq scottyhq added the kind/feature New feature or request label Jul 26, 2019
@adamjohnson01
Copy link
Contributor

@scottyhq , this can be achieved using tags in your nodegroup config

nodeGroups:
  - name: autoscalingNodegroup
    instanceType: m5.xlarge
    desiredCapacity: 0
    minSize: 0
    maxSize: 10
    tags:
        k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose     worker
        k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated     "worker:NoSchedule"

@scottyhq
Copy link
Author

Thanks @adamjohnson01 - just wanted to confirm that the above config works for on-demand nodes.

If people are using mixed spot instances scaling from zero requires kubernetes and autoscaler 1.14, which is now out kubernetes/autoscaler#2246 (comment).

So I think this issue can be closed!

mgalgs added a commit to mgalgs/eksctl that referenced this issue Oct 24, 2019
When the cluster-autoscaler adds a new node to a group, it grabs an
existing node in the group and builds a "template" to launch a new node
identical to the one it grabbed from the group.

However, when scaling up from 0 there aren't any live nodes to reference to
build this template.  Instead, the cluster-autoscaler relies on tags in the
ASG to build the new node template.  This can cause unexpected behavior if
the pods triggering the scale-out are using node selectors or taints; CA
doesn't have sufficient information to decide if a new node launched in the
group will satisfy the request.

The long and short of it is that for CA to do its job properly we must tag
our ASGs corresponding to our labels and taints.  Add a note in the docs
about this since scaling up from 0 is a fairly common use case.

References:

  - kubernetes/autoscaler#2418
  - eksctl-io#1066
mgalgs added a commit to mgalgs/eksctl that referenced this issue Oct 24, 2019
When the cluster-autoscaler adds a new node to a group, it grabs an
existing node in the group and builds a "template" to launch a new node
identical to the one it grabbed from the group.

However, when scaling up from 0 there aren't any live nodes to reference to
build this template.  Instead, the cluster-autoscaler relies on tags in the
ASG to build the new node template.  This can cause unexpected behavior if
the pods triggering the scale-out are using node selectors or taints; CA
doesn't have sufficient information to decide if a new node launched in the
group will satisfy the request.

The long and short of it is that for CA to do its job properly we must tag
our ASGs corresponding to our labels and taints.  Add a note in the docs
about this since scaling up from 0 is a fairly common use case.

References:

  - kubernetes/autoscaler#2418
  - eksctl-io#1066
@techtransplant
Copy link

I'm having trouble scaling up from 0 with spot instances. Is that feature not available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants