Skip to content

Commit

Permalink
Mark webhook and controller as safe-to-evict
Browse files Browse the repository at this point in the history
The safe-to-evict annotation tells the cluster autoscaler whether the
pod can be evicted to allow the node it's on to scale down.

This was set to false (by me!) 2 years ago in fc6ef39
to prevent service unreliability during scale-down events. If the
no webhook replicas are available, users can't create/update/delete
Tekton objects; if no controller replicas are available, status updates
from Pod events, etc., won't be processed.

Unfortunately, blocking node eviction means the node that the pod(s) get
scheduled to can't be scaled down. Furthermore, the nodes can't be fully
drained when updating the cluster. This can leave a cluster in a
mid-upgrade state that can make issues difficult to diagnose and reason
about.

With this change, a cluster scale-down event might cause temporary
service unreliability with the default single-replica configuration. As
with #3787 if a user/operator wants to prevent this, they should
configure more replicas for HA.
  • Loading branch information
imjasonh authored and tekton-robot committed Jul 29, 2021
1 parent a728ce3 commit 5350069
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 7 deletions.
2 changes: 0 additions & 2 deletions config/controller.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,6 @@ spec:
app.kubernetes.io/part-of: tekton-pipelines
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
labels:
app.kubernetes.io/name: controller
app.kubernetes.io/component: controller
Expand Down
2 changes: 0 additions & 2 deletions config/webhook.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,6 @@ spec:
app.kubernetes.io/part-of: tekton-pipelines
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
labels:
app.kubernetes.io/name: webhook
app.kubernetes.io/component: webhook
Expand Down
6 changes: 3 additions & 3 deletions docs/enabling-ha.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,9 +95,9 @@ spec:
minReplicas: 1
```
By default, the Webhook deployment is configured to block a [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) from scaling down the node that's running the only replica of the deployment using the `cluster-autoscaler.kubernetes.io/safe-to-evict` annotation.
This is configured because, while the only replica of the Webhook is unavailable, Tekton resources can't be created, updated or deleted.
If you configure more than one replica, you can remove the annotation to allow the Cluster Autoscaler more freedom to scale down nodes, without disrupting the Webhook service.
By default, the Webhook deployment is _not_ configured to block a [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) from scaling down the node that's running the only replica of the deployment using the `cluster-autoscaler.kubernetes.io/safe-to-evict` annotation.
This means that during node drains, the Webhook might be unavailable temporarily, during which time Tekton resources can't be created, updated or deleted.
To avoid this, you can add the `safe-to-evict` annotation set to `false` to block node drains during autoscaling, or, better yet, configure multiple replicas of the Webhook deployment.

### Avoiding Disruptions

Expand Down

0 comments on commit 5350069

Please sign in to comment.