Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PodDisruptionBudget causing inability to gracefully drain a node with tekton-pipelines-webhook pod #3654

Closed
r0bj opened this issue Jan 5, 2021 · 13 comments · Fixed by #3784 or #3787
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@r0bj
Copy link

r0bj commented Jan 5, 2021

Expected Behavior

Gracefully draining nodes which contains tekton pods are possible.

Actual Behavior

With single running tekton-pipelines-webhook pod (which is possible due to tekton-pipelines-webhook: HorizontalPodAutoscaler.spec.minReplicas: 1) it's impossible to gracefully drain a node because PodDisruptionBudget is set to minAvailable: 80%.

Steps to Reproduce the Problem

  1. Run tekton-pipeline: kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/previous/v0.19.0/release.yaml
  2. Make sure there is single tekton-pipelines-webhook pod (scaled down)
  3. Try to drain node with tekton-pipelines-webhook pod

Additional Info

  • Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:41:49Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
  • Tekton Pipeline version:
Client version: 0.15.0
Pipeline version: v0.19.0
Triggers version: v0.10.0
@r0bj r0bj added the kind/bug Categorizes issue or PR as related to a bug. label Jan 5, 2021
@imjasonh
Copy link
Member

This definitely does sound like something we need to document and communicate to operators better.

The PodDisruptionBudget was put in place to prevent users from being unable to create PipelineRuns/TaskRuns/etc when the webhook Pod is unavailable, e.g., during an upgrade. Unfortunately they way it did this basically turned into "never let Kubernetes make the Pod unavailable" 😬

There are basically two options, I think:

  1. If an operator can tolerate some downtime (a few seconds), they can remove the PodDisruptionBudget -- either always, or only just before the upgrade, then put it back when they're done.
  2. Scale up the webhook deployment -- again, either always, or only just before an upgrade, and back down again after. Since minAvailable is 80%, you'd need 5 replicas to be able to safely take one down, and Kubernetes would do a rolling upgrade through one at a time.

I think the bug here is to, at the very least, document this behavior for operators. We might want to make minAvailable 50% so that only two replicas are needed to trigger the rolling ugprade behavior.

cc @vdemeester @nikhil-thomas

@imjasonh
Copy link
Member

The PodDisruptionBudget was put in place in #3391 cc @raballew

@imjasonh imjasonh added this to the Pipelines 0.22 milestone Feb 12, 2021
@bitsofinfo
Copy link

1 and/or 2 are what i did to get around it, but it burned a day or so w/ our k8s paas support trying to triage why the upgrade of the cluster was halted/stuck due to this.

@bitsofinfo
Copy link

agree, its more of a documentation heads up clear warning kind of thing

@r0bj
Copy link
Author

r0bj commented Feb 12, 2021

In my opinion defaults should be reasonable. Current default values casing trouble operating cluster and require manual intervention during normal operations like draining nodes.

In this case I see options:

  • PodDisruptionBudget is optional, not enabled by default
  • HorizontalPodAutoscaler is set with higher minReplicas to meet PodDisruptionBudget demands

@imjasonh
Copy link
Member

In my opinion defaults should be reasonable. Current default values casing trouble operating cluster and require manual intervention during normal operations like draining nodes.

I agree, the default configuration should be safe and easy to use for basic usage.

cc @afrittoli

Worth noting that Knative seems to have the same configuration (autoscaling, HPA), I wonder if they've got the same issue, or whether they have a solution: https://github.com/knative/serving/blob/master/config/core/deployments/webhook-hpa.yaml added in knative/serving#9444

@nikhil-thomas
Copy link
Member

Worth noting that Knative seems to have the same configuration (autoscaling, HPA), I wonder if they've got the same issue, or whether they have a solution: https://github.com/knative/serving/blob/master/config/core/deployments/webhook-hpa.yaml added in knative/serving#9444

the knative team is handling this with the operator, by patching the pdb on the fly.
https://github.com/openshift-knative/serverless-operator/blob/master/openshift-knative-operator/hack/003-serving-pdb.patch

@imjasonh
Copy link
Member

This issue also seems to be related: knative/operator#376

If we change our PDB to minAvailable:1 as in that patch, will the HPA scale up a second pod on another node, thus allowing the first pod to go down? I don't know enough about how HPA+PDB work together.

@imjasonh
Copy link
Member

@nak3 seems to be responsible for both knative/operator#376 and the minAvailable:1 patch added in openshift-knative/serverless-operator#751, I wonder if he could advise on how best to approach this in Tekton as well.

In general I'd prefer to solve this by making Tekton's default configuration easy to use even if it means being slightly less HA-capable, and then document HA and maybe make operators responsible for automating it. Anyway, curious for @nak3's thoughts.

@nak3
Copy link

nak3 commented Feb 16, 2021

I'd prefer to solve this by making Tekton's default configuration easy to use even if it means being slightly less HA-capable,

Yes, we use minAvailable:1 with the same reason. Although it is not perfect solution as you know, it is easy to use.
You can refer to https://issues.redhat.com/browse/SRVKS-670 (sorry the link is Red Hat only) for other opinions when we had the discussion.

@nikhil-thomas
Copy link
Member

@abhinavkhanna-sf
a quick solution is to change the pdb resource in your tekton-pipelines installation.
eg: https://github.com/tektoncd/operator/blob/main/cmd/openshift/kodata/tekton-pipeline/0.19.0/00_tektoncd-pipeline-v0.19.0.yaml#L1826

@imjasonh
Copy link
Member

Turns out, #3784 probably won't actually fix the issue. 😢

Based on investigation from @nikhil-thomas (🙇‍♂️ ), even with minAvailable:1, Kubernetes won't scale up another replica before turning the old one down to drain the node.

We basically have two options in the near-term:

  1. Always run at least 2 replicas (replicas:2 in the deployment, minReplicas:2 in the HPA), with node anti-affinity so they're always on different nodes, and they'll take turns going down when nodes drain/restart. This means requiring double the resources, even when in the normal steady state one replica is enough.
    • A downside to this, as pointed out by @nikhil-thomas, is that if the cluster has one node (minikube, small test clusters), we'll still end up in a state where nodes can't drain because it'll mean violating the PDB. Getting around this is a bit more involved (see below), and possibly out of scope since single-node test clusters probably don't care about HA that much 🤷‍♂️
  2. Disable the PDB by default, and instruct users on how to re-enable it. This could be as easy as minAvailable:0 in the PDB, with a comment and markdown docs in docs/enabling-ha.md, that to enable HA for the webhook you need to set replicas:2 (or more) and minAvailable:1.

Given those options, I think (2) is the least surprising/painful to users, and I'll send a PR today to Make It So.

Longer-term, this is something the operator can handle for us, in a couple possible ways:

  1. Expose a setting like enableHA: true, which turns on/off the minReplicas:2/minAvailable:1 described in the docs above and makes it easier to enable/disable and harder to mess up.
    • This setting could possibly also detect a single-node situation and refuse to enable HA, and surface a warning that HA is impossible in single-node scenarios.
  2. Possibly detect that nodes are draining and temporarily set replicas:2 to scale/move the replica, to unblock draining.

In any case, these operator changes will be more involved than just documentation, so they probably deserve a TEP of their own, and won't happen soon.

@sbose78
Copy link
Contributor

sbose78 commented Mar 5, 2021

Great adventure!

Expose a setting like enableHA: true, which turns on/off the minReplicas:2/minAvailable:1 described in the docs above and makes it easier to enable/disable and harder to mess up.
This setting could possibly also detect a single-node situation and refuse to enable HA, and surface a warning that HA is impossible in single-node scenarios.

This looks like a good first step!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
6 participants