From f51f891a0077f19f0f89db08e056b6355c476d04 Mon Sep 17 00:00:00 2001
From: Jason Hall <jasonhall@redhat.com>
Date: Wed, 28 Jul 2021 12:13:47 -0400
Subject: [PATCH] Mark webhook and controller as safe-to-evict

The safe-to-evict annotation tells the cluster autoscaler whether the
pod can be evicted to allow the node it's on to scale down.

This was set to false (by me!) 2 years ago in https://github.com/tektoncd/pipeline/commit/fc6ef39101a1bbead39c8271d813ed0e70733cb1
to prevent service unreliability during scale-down events. If the
no webhook replicas are available, users can't create/update/delete
Tekton objects; if no controller replicas are available, status updates
from Pod events, etc., won't be processed.

Unfortunately, blocking node eviction means the node that the pod(s) get
scheduled to can't be scaled down. Furthermore, the nodes can't be fully
drained when updating the cluster. This can leave a cluster in a
mid-upgrade state that can make issues difficult to diagnose and reason
about.

With this change, a cluster scale-down event might cause temporary
service unreliability with the default single-replica configuration. As
with #3787 if a user/operator wants to prevent this, they should
configure more replicas for HA.
---
 config/controller.yaml | 2 --
 config/webhook.yaml    | 2 --
 docs/enabling-ha.md    | 6 +++---
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/config/controller.yaml b/config/controller.yaml
index c1de705d7d8..9568c32c383 100644
--- a/config/controller.yaml
+++ b/config/controller.yaml
@@ -37,8 +37,6 @@ spec:
       app.kubernetes.io/part-of: tekton-pipelines
   template:
     metadata:
-      annotations:
-        cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
       labels:
         app.kubernetes.io/name: controller
         app.kubernetes.io/component: controller
diff --git a/config/webhook.yaml b/config/webhook.yaml
index 63f0b20eebb..739a12d75b9 100644
--- a/config/webhook.yaml
+++ b/config/webhook.yaml
@@ -40,8 +40,6 @@ spec:
       app.kubernetes.io/part-of: tekton-pipelines
   template:
     metadata:
-      annotations:
-        cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
       labels:
         app.kubernetes.io/name: webhook
         app.kubernetes.io/component: webhook
diff --git a/docs/enabling-ha.md b/docs/enabling-ha.md
index d95890a4656..38290f706e3 100644
--- a/docs/enabling-ha.md
+++ b/docs/enabling-ha.md
@@ -95,9 +95,9 @@ spec:
   minReplicas: 1
 ```
 
-By default, the Webhook deployment is configured to block a [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) from scaling down the node that's running the only replica of the deployment using the `cluster-autoscaler.kubernetes.io/safe-to-evict` annotation.
-This is configured because, while the only replica of the Webhook is unavailable, Tekton resources can't be created, updated or deleted.
-If you configure more than one replica, you can remove the annotation to allow the Cluster Autoscaler more freedom to scale down nodes, without disrupting the Webhook service.
+By default, the Webhook deployment is _not_ configured to block a [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) from scaling down the node that's running the only replica of the deployment using the `cluster-autoscaler.kubernetes.io/safe-to-evict` annotation.
+This means that during node drains, the Webhook might be unavailable temporarily, during which time Tekton resources can't be created, updated or deleted.
+To avoid this, you can add the `safe-to-evict` annotation set to `false` to block node drains during autoscaling, or, better yet, configure multiple replicas of the Webhook deployment.
 
 ### Avoiding Disruptions