Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webhook controller pod is ready too soon #719

Closed
dhduvall opened this issue Jan 24, 2023 · 1 comment · Fixed by #721
Closed

webhook controller pod is ready too soon #719

dhduvall opened this issue Jan 24, 2023 · 1 comment · Fixed by #721
Labels
bug Something isn't working webhook

Comments

@dhduvall
Copy link
Contributor

Describe the bug

After installing the webhook controller (via Helm), the pod serving the controller is marked as Ready even though pods it should be controlling aren't modified until later, up to a minute or two.

Steps To Reproduce

helm install ... --wait --repo https://azure.github.io/azure-workload-identity/charts --version 0.15.0 workload-identity-webhook workload-identity-webhook

Start a pod which has been configured to be managed by the controller.

Expected behavior

Pods created immediately after the helm command finishes should be modified by the controller.

Logs

I0124 17:55:16.395363       1 main.go:88] entrypoint "msg"="initializing metrics backend" "backend"="prometheus"
I0124 17:55:16.395436       1 main.go:95] entrypoint "msg"="setting up manager" "userAgent"="azure-workload-identity/webhook/runasroot-1 (linux/amd64) df6362a/2023-01-12-00:58"
I0124 17:55:16.649725       1 listener.go:44] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"=":8095"
I0124 17:55:16.650349       1 main.go:115] entrypoint "msg"="setting up cert rotation"
I0124 17:55:16.650668       1 main.go:147] entrypoint "msg"="starting manager"
I0124 17:55:16.652206       1 internal.go:366]  "msg"="Starting server" "addr"={"IP":"::","Port":8095,"Zone":""} "kind"="metrics" "path"="/metrics"
I0124 17:55:16.652241       1 internal.go:366]  "msg"="Starting server" "addr"={"IP":"::","Port":9440,"Zone":""} "kind"="health probe"
I0124 17:55:16.652385       1 controller.go:185]  "msg"="Starting EventSource" "controller"="cert-rotator" "source"="&{{%!s(*v1.Secret=&{{ } {      0 {{0 0 <nil>}} <nil> <nil> map[] map[] [] [] []} <nil> map[] map[] }) %!s(*cache.informerCache=&{0xc00049bb80}) %!s(chan error=<nil>) %!s(func()=<nil>)}}"
I0124 17:55:16.652410       1 controller.go:185]  "msg"="Starting EventSource" "controller"="cert-rotator" "source"="&{{%!s(*unstructured.Unstructured=&{map[apiVersion:admissionregistration.k8s.io/v1 kind:MutatingWebhookConfiguration]}) %!s(*cache.informerCache=&{0xc00049bb80}) %!s(chan error=<nil>) %!s(func()=<nil>)}}"
I0124 17:55:16.652432       1 controller.go:193]  "msg"="Starting Controller" "controller"="cert-rotator"
I0124 17:55:16.753556       1 rotator.go:204] cert-rotation "msg"="starting cert rotator controller"
I0124 17:55:16.753778       1 rotator.go:245] cert-rotation "msg"="refreshing CA and server certs"
I0124 17:55:16.753694       1 controller.go:227]  "msg"="Starting workers" "controller"="cert-rotator" "worker count"=1
E0124 17:55:16.753973       1 rotator.go:674] cert-rotation "msg"="secret is not well-formed, cannot update webhook configurations" "error"="Cert secret is not well-formed, missing ca.crt"
I0124 17:55:19.408882       1 rotator.go:722] cert-rotation "msg"="Ensuring CA cert" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"MutatingWebhookConfiguration"} "name"="azure-wi-webhook-mutating-webhook-configuration"
E0124 17:55:19.491674       1 rotator.go:729] cert-rotation "msg"="Error updating webhook with certificate" "error"="Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"azure-wi-webhook-mutating-webhook-configuration\": the object has been modified; please apply your changes to the latest version and try again" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"MutatingWebhookConfiguration"} "name"="azure-wi-webhook-mutating-webhook-configuration"
E0124 17:55:19.491737       1 controller.go:326]  "msg"="Reconciler error" "error"="Operation cannot be fulfilled on mutatingwebhookconfigurations.admissionregistration.k8s.io \"azure-wi-webhook-mutating-webhook-configuration\": the object has been modified; please apply your changes to the latest version and try again" "controller"="cert-rotator" "name"="azure-wi-webhook-server-cert" "namespace"="azure-workload-identity-system" "object"={"name":"azure-wi-webhook-server-cert","namespace":"azure-workload-identity-system"} "reconcileID"="f9071a34-afdb-4119-b623-e91d0666d7d9"
I0124 17:55:19.491953       1 rotator.go:722] cert-rotation "msg"="Ensuring CA cert" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"MutatingWebhookConfiguration"} "name"="azure-wi-webhook-mutating-webhook-configuration"
I0124 17:55:19.498804       1 rotator.go:722] cert-rotation "msg"="Ensuring CA cert" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"MutatingWebhookConfiguration"} "name"="azure-wi-webhook-mutating-webhook-configuration"
E0124 17:55:20.712482       1 rotator.go:247] cert-rotation "msg"="could not refresh CA and server certs" "error"="Operation cannot be fulfilled on secrets \"azure-wi-webhook-server-cert\": the object has been modified; please apply your changes to the latest version and try again"
I0124 17:55:20.731133       1 rotator.go:271] cert-rotation "msg"="no cert refresh needed"
I0124 17:56:45.536432       1 rotator.go:757] cert-rotation "msg"="certs are ready in /certs"
I0124 17:56:45.536466       1 rotator.go:777] cert-rotation "msg"="CA certs are injected to webhooks"
I0124 17:56:45.536481       1 main.go:159] entrypoint "msg"="setting up webhook server"
I0124 17:56:45.536535       1 server.go:216] controller-runtime/webhook/webhooks "msg"="Starting webhook server"
I0124 17:56:45.536612       1 main.go:163] entrypoint "msg"="registering webhook to the webhook server"
I0124 17:56:45.536698       1 server.go:148] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-v1-pod"
I0124 17:56:45.536861       1 certwatcher.go:131] controller-runtime/certwatcher "msg"="Updated current TLS certificate"
I0124 17:56:45.536945       1 server.go:270] controller-runtime/webhook "msg"="Serving webhook server" "host"="" "port"=9443
I0124 17:56:45.537016       1 certwatcher.go:85] controller-runtime/certwatcher "msg"="Starting certificate watcher"

Environment

  • Kubernetes version (use kubectl version): 1.23.12
  • Cloud provider or hardware configuration: AKS

Additional context

I worked around this with the following loop executed after my helm invocation:

while ! kubectl -n azwi-system logs deploy/azure-wi-webhook-controller-manager | grep -q 'Serving webhook server'; do
    sleep 1
done

Note that in the logs above, this comes a minute and a half after the pod starts. I've seen it get to that point within 30s, but also take up to 3m. The pod is marked Ready at 17:55:17.

I see that the readinessProbe handled by /readyz is just a ping checker. Perhaps StartedChecker() would be a better choice?

This may also be of relevance to the question posed in #540 (comment)

@dhduvall dhduvall added the bug Something isn't working label Jan 24, 2023
@aramase
Copy link
Member

aramase commented Jan 26, 2023

Closed with #721

@aramase aramase closed this as completed Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working webhook
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants