Disable Webhook PDB by default, document enabling it #3787

imjasonh · 2021-02-24T15:00:18Z

Changes

This change disables PodDisruptionBudget for the webhook deployment, and documents how to re-enable it in docs/enabling-ha. It also makes some edits to enabling-ha.md to streamline and recommend best practices.

/kind bug

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

[n] Includes tests (if functionality changed/added)
[y] Includes docs (if user facing)
[y] Commit messages follow commit message best practices
[y] Release notes block has been filled in or deleted (only if no user facing changes)

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

If you are adding a new binary/image to the cmd dir, please update
the release Task to build and release this image.

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

Release Notes

Disable PodDisruptionBudget for the webhook deployment by default

cc @vdemeester @nikhil-thomas @bobcatfish

imjasonh · 2021-02-24T15:05:16Z

cc @raballew as well, who added the PDB in #3391

nikhil-thomas · 2021-02-24T15:07:12Z

/lgtm

tekton-robot · 2021-02-24T15:07:20Z

@nikhil-thomas: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vdemeester

/meow
Thanks for doing this @imjasonh (and @nikhil-thomas for the exploration)

tekton-robot · 2021-02-24T17:59:37Z

@vdemeester:

In response to this:

/meow
Thanks for doing this @imjasonh (and @nikhil-thomas for the exploration)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot · 2021-02-24T17:59:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [vdemeester]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

zhangtbj · 2021-03-02T09:27:33Z

Hi Jason,

A quick question, is that possible to remove the PDB setting from Tekton deployment and document how to configure it in the document? :)

Then the user can choose enable or disable it by themselves.

We enable PDB in our Tekton on production env. If it is set MinAvailable as 0 or other values by default, I am afraid it may override the existing settings or maybe to other users.

sbose78 · 2021-03-05T05:49:02Z

I see @zhangtbj 's point here. Shipping the hpa manifests might override existing values. We may consider skipping the hpa manifests altogether and instead include documentation on it ?

vdemeester · 2021-03-05T06:38:55Z

I see @zhangtbj 's point here. Shipping the hpa manifests might override existing values. We may consider skipping the hpa manifests altogether and instead include documentation on it ?

And move that knowledge/management to the operator 👼🏼

imjasonh · 2021-03-09T17:38:13Z

Sorry for letting this PR slip through the cracks. Let's get this in before 0.22.

To clarify, the specific ask is to remove the PDB from webhook-hpa.yaml from the default Tekton installation bundle, and instead document how to enable it, with example YAML in the docs. Does that sound right to you @zhangtbj @sbose78 ?

sbose78 · 2021-03-09T17:59:24Z

That's right, Jason.

…

On Tue, Mar 9, 2021, 12:38 Jason Hall ***@***.***> wrote: Sorry for letting this PR slip through the cracks. Let's get this in before 0.22. To clarify, the specific ask is to remove the PDB from webhook-hpa.yaml from the default Tekton installation bundle, and instead *document* how to enable it, with example YAML in the docs. Does that sound right to you @zhangtbj <https://github.com/zhangtbj> @sbose78 <https://github.com/sbose78> ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3787 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEFEAEIDY722GJ7AQ3X6V3TCZFIPANCNFSM4YEVKIRA> .

imjasonh · 2021-03-09T18:03:20Z

That's right, Jason.

Done. 👍

pritidesai · 2021-03-09T21:06:32Z

@sbose78 please help review the changes 🙏 (looking for /lgtm 😉 )

sbose78 · 2021-03-09T21:29:15Z

/lgtm

…

On Tue, Mar 9, 2021, 16:06 Priti Desai ***@***.***> wrote: @sbose78 <https://github.com/sbose78> please help review the changes 🙏 (looking for /lgtm 😉 ) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3787 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEFEAB3QWZFYJNKFYUCSRDTCZ5VPANCNFSM4YEVKIRA> .

tekton-robot · 2021-03-09T21:29:24Z

@sbose78: changing LGTM is restricted to collaborators

In response to this:

/lgtm

On Tue, Mar 9, 2021, 16:06 Priti Desai [email protected] wrote:

@sbose78 https://github.com/sbose78 please help review the changes 🙏
(looking for /lgtm 😉 )

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3787 (comment),
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAEFEAB3QWZFYJNKFYUCSRDTCZ5VPANCNFSM4YEVKIRA
.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pritidesai · 2021-03-09T22:51:00Z

Thanks @sbose78, we will have to add you to the org. I will create a separate PR in the community repo.

pritidesai · 2021-03-09T23:41:42Z

You should get the /lgtm privilege after this PR in community repo is merged 😄 Until then,

/lgtm

sbose78 · 2021-03-09T23:43:54Z

Thank you very much, Priti!

…

On Tue, Mar 9, 2021, 18:42 Priti Desai ***@***.***> wrote: You should get the /lgtm privilege after this <tektoncd/community#376> PR in community repo is merged 😄 Until then, /lgtm — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3787 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEFEAHM436V3SUK6LEZLKTTC2P3JANCNFSM4YEVKIRA> .

zhangtbj · 2021-03-10T01:02:02Z

Cool, thank you Jason! :)

The safe-to-evict annotation tells the cluster autoscaler whether the pod can be evicted to allow the node it's on to scale down. This was set to false (by me!) 2 years ago in tektoncd@fc6ef39 to prevent service unreliability during scale-down events. If the no webhook replicas are available, users can't create/update/delete Tekton objects; if no controller replicas are available, status updates from Pod events, etc., won't be processed. Unfortunately, blocking node eviction means the node that the pod(s) get scheduled to can't be scaled down. Furthermore, the nodes can't be fully drained when updating the cluster. This can leave a cluster in a mid-upgrade state that can make issues difficult to diagnose and reason about. With this change, a cluster scale-down event might cause temporary service unreliability with the default single-replica configuration. As with tektoncd#3787 if a user/operator wants to prevent this, they should configure more replicas for HA.

The safe-to-evict annotation tells the cluster autoscaler whether the pod can be evicted to allow the node it's on to scale down. This was set to false (by me!) 2 years ago in fc6ef39 to prevent service unreliability during scale-down events. If the no webhook replicas are available, users can't create/update/delete Tekton objects; if no controller replicas are available, status updates from Pod events, etc., won't be processed. Unfortunately, blocking node eviction means the node that the pod(s) get scheduled to can't be scaled down. Furthermore, the nodes can't be fully drained when updating the cluster. This can leave a cluster in a mid-upgrade state that can make issues difficult to diagnose and reason about. With this change, a cluster scale-down event might cause temporary service unreliability with the default single-replica configuration. As with #3787 if a user/operator wants to prevent this, they should configure more replicas for HA.

The safe-to-evict annotation tells the cluster autoscaler whether the pod can be evicted to allow the node it's on to scale down. This was set to false (by me!) 2 years ago in tektoncd@fc6ef39 to prevent service unreliability during scale-down events. If the no webhook replicas are available, users can't create/update/delete Tekton objects; if no controller replicas are available, status updates from Pod events, etc., won't be processed. Unfortunately, blocking node eviction means the node that the pod(s) get scheduled to can't be scaled down. Furthermore, the nodes can't be fully drained when updating the cluster. This can leave a cluster in a mid-upgrade state that can make issues difficult to diagnose and reason about. With this change, a cluster scale-down event might cause temporary service unreliability with the default single-replica configuration. As with tektoncd#3787 if a user/operator wants to prevent this, they should configure more replicas for HA. (cherry picked from commit 5350069) Signed-off-by: Vincent Demeester <[email protected]>

tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Feb 24, 2021

tekton-robot requested review from bobcatfish and dibyom February 24, 2021 15:00

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 24, 2021

imjasonh force-pushed the webhook-ha branch from dcf2f39 to fd77efd Compare February 24, 2021 15:01

imjasonh changed the title ~~Disable HA webhook by default, document enabling it~~ Disable Webhook PDB by default, document enabling it Feb 24, 2021

vdemeester approved these changes Feb 24, 2021

View reviewed changes

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 24, 2021

pritidesai added this to the Pipelines 0.22 milestone Mar 9, 2021

Disable Webhook PDB by default, document enabling it

a6a4bb9

imjasonh force-pushed the webhook-ha branch from fd77efd to a6a4bb9 Compare March 9, 2021 18:03

pritidesai mentioned this pull request Mar 9, 2021

adding sbose78 to the org tektoncd/community#376

Merged

tekton-robot assigned pritidesai Mar 9, 2021

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 9, 2021

tekton-robot merged commit 10870df into tektoncd:master Mar 10, 2021

qu1queee mentioned this pull request Apr 23, 2021

Bump Tekton to v0.23.0 shipwright-io/build#741

Merged

4 tasks

imjasonh mentioned this pull request Jul 28, 2021

Mark webhook and controller as safe-to-evict #4124

Merged

4 tasks

vdemeester mentioned this pull request Aug 17, 2021

Mark webhook and controller as safe-to-evict openshift/tektoncd-pipeline#729

Merged

dprotaso mentioned this pull request May 23, 2023

cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation prevents GKE nodepool to scale down knative/serving#13984

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable Webhook PDB by default, document enabling it #3787

Disable Webhook PDB by default, document enabling it #3787

imjasonh commented Feb 24, 2021

imjasonh commented Feb 24, 2021

nikhil-thomas commented Feb 24, 2021

tekton-robot commented Feb 24, 2021

vdemeester left a comment

tekton-robot commented Feb 24, 2021

tekton-robot commented Feb 24, 2021

zhangtbj commented Mar 2, 2021

sbose78 commented Mar 5, 2021

vdemeester commented Mar 5, 2021

imjasonh commented Mar 9, 2021

sbose78 commented Mar 9, 2021 via email

imjasonh commented Mar 9, 2021

pritidesai commented Mar 9, 2021

sbose78 commented Mar 9, 2021 via email

tekton-robot commented Mar 9, 2021

pritidesai commented Mar 9, 2021

pritidesai commented Mar 9, 2021

sbose78 commented Mar 9, 2021 via email

zhangtbj commented Mar 10, 2021

Disable Webhook PDB by default, document enabling it #3787

Disable Webhook PDB by default, document enabling it #3787

Conversation

imjasonh commented Feb 24, 2021

Changes

Submitter Checklist

Reviewer Notes

Release Notes

imjasonh commented Feb 24, 2021

nikhil-thomas commented Feb 24, 2021

tekton-robot commented Feb 24, 2021

vdemeester left a comment

Choose a reason for hiding this comment

tekton-robot commented Feb 24, 2021

tekton-robot commented Feb 24, 2021

zhangtbj commented Mar 2, 2021

sbose78 commented Mar 5, 2021

vdemeester commented Mar 5, 2021

imjasonh commented Mar 9, 2021

sbose78 commented Mar 9, 2021 via email

imjasonh commented Mar 9, 2021

pritidesai commented Mar 9, 2021

sbose78 commented Mar 9, 2021 via email

tekton-robot commented Mar 9, 2021

pritidesai commented Mar 9, 2021

pritidesai commented Mar 9, 2021

sbose78 commented Mar 9, 2021 via email

zhangtbj commented Mar 10, 2021