Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting failurePolicy to Fail in the admission webhook does not work #5185

Closed
monotek opened this issue Nov 15, 2023 · 18 comments
Closed

Setting failurePolicy to Fail in the admission webhook does not work #5185

monotek opened this issue Nov 15, 2023 · 18 comments
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity

Comments

@monotek
Copy link

monotek commented Nov 15, 2023

Report

We've installed keda via the downloaded release yaml.

We want to use the admission webhook with "failurePolicy: Fail".
As soon as we change that we see the following issue, when fluxcd tries to apply a namesapce using a scaledobject.

{"level":"error","ts":"2023-11-15T13:11:16.956Z","msg":"Reconciliation failed after 2.426684853s, next try in 5m0s","controller":"kustomization","controllerGroup":"kustomize.toolkit.fluxcd.io","controllerKind":"Kustomization","Kustomization":{"name":"gatekeeper","namespace":"flux-system"},"namespace":"flux-system","name":"gatekeeper","reconcileID":"9d4d6fe1-da59-40e9-af8c-798651024822","revision":"dev@sha1:b7ced8d1a6d67107cedfb8bdd665d81de2877ba0","error":"ScaledObject/gatekeeper/httpcache dry-run failed, reason: InternalError: Internal error occurred: failed calling webhook \"vscaledobject.kb.io\": failed to call webhook: Post \"[https://keda-admission-webhooks.keda.svc:443/validate-keda-sh-v1alpha1-scaledobject?timeout=10s](https://keda-admission-webhooks.keda.svc/validate-keda-sh-v1alpha1-scaledobject?timeout=10s)\": EOF\n"}

The admisson webhook deployment is running and the svc is reachable via port-forward.
The used certificates use the right names.

How can we further debug this?

I'm not sure if "internal error" is an error of the admission controller or if the admission controller can't be reached.
The admission controller itself does not log any error.

For us it looks like the admission webhook does not work at all but the error is ignored with the default config?

Expected Behavior

The admission webhook works with: failurePolicy: Fail

Actual Behavior

The admission webhook can't be used becauser of an "internal error".

Steps to Reproduce the Problem

  1. edit the admisison controller and set failurePolicy: Fail

Logs from KEDA operator

The only error i found in the operator logs was:

2023-11-15T13:42:32Z	ERROR	cert-rotation	Webhook not found. Unable to update certificate.	{"name": "keda-admission-webhooks", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "error": "ValidatingWebhookConfiguration.admissionregistration.k8s.io \"keda-admission-webhooks\" not found"}

So the name of the webhook seemed not to match.

Changing the name of the validating webhook from keda-admission to keda-admission-webhooks did not help.

Afterwards we saw errors like:

2023-11-15T14:07:30Z	ERROR	cert-rotation	Error updating webhook with certificate	{"name": "keda-admission-webhooks", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "error": "Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"keda-admission-webhooks\": the object has been modified; please apply your changes to the latest version and try again"}

KEDA Version

2.12.0

Kubernetes Version

1.26

Platform

Microsoft Azure

Scaler Details

No response

Anything else?

No response

@monotek monotek added the bug Something isn't working label Nov 15, 2023
@JorTurFer
Copy link
Member

Hello,
What manifest are you using to deploy KEDA? There are 2 flavors, with and without webhooks and maybe there is any failure on them

@monotek
Copy link
Author

monotek commented Nov 27, 2023

We've installed keda via the downloaded release yaml which we split up in separate files and apply with fluxcd's kustomize controller.

@JorTurFer
Copy link
Member

JorTurFer commented Nov 27, 2023

There are 3 different yaml inside the release:

  • keda-2.12.0-core.yaml
  • keda-2.12.0-crds.yaml
  • keda-2.12.0.yaml

Which are you using?

@monotek
Copy link
Author

monotek commented Nov 27, 2023

  • keda-2.12.0-crds.yaml
  • keda-2.12.0.yaml

@JorTurFer
Copy link
Member

I'm reviewing the configuration and it's looks nice in the yamls but I've noticed a weird thing. The logs you sent say that the ValidatingWebhookConfiguration's name is keda-admission-webhooks but that's not the default value, default value is keda-admission.
keda-admission-webhooks is the service name between the ValidatingWebhookConfiguration and the webhook deployment, but that's correctly configured in the yaml.
Are you modifying the naming? The ValidatingWebhookConfiguration's name that KEDA operator will use is provided by this arg validating-webhook-name. Can you check KEDA operator pod's arguments to check if it's overrided

@monotek
Copy link
Author

monotek commented Nov 28, 2023

As we were looking into the issue we thought the naming might not match at some point and tried to change it but it made no difference. We might have forgotten to set everything back to the default.

@JorTurFer
Copy link
Member

Could you try to set everything to default and post the logs?

@monotek
Copy link
Author

monotek commented Nov 28, 2023

Yes, thanks for your help :)
Will do.
Might need some days though.

@JorTurFer
Copy link
Member

Sure, just ping me back when you have more info :)

@monotek
Copy link
Author

monotek commented Nov 30, 2023

We updated all resources to the 2.12.1 release now, overwriting our changes.
After setting "failurePolicy: Fail" in the validating webhook the namespaces with a scaledobject can't be applied any longer.

✗ Kustomization reconciliation failed: ScaledObject/istio-system/istio-ingressgateway dry-run failed, reason: InternalError: Internal error occurred: failed calling webhook "vscaledobject.kb.io": failed to call webhook: Post "https://keda-admission-webhooks.keda.svc:443/validate-keda-sh-v1alpha1-scaledobject?timeout=10s": EOF

We also still see the following errors in the operator log:

2023-11-30T07:39:46Z	ERROR	cert-rotation	Error updating webhook with certificate	{"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "error": "Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"keda-admission\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/open-policy-agent/cert-controller/pkg/rotator.(*ReconcileWH).ensureCerts
	/workspace/vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go:789
github.com/open-policy-agent/cert-controller/pkg/rotator.(*ReconcileWH).Reconcile
	/workspace/vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go:739
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
2023-11-30T07:39:46Z	ERROR	Reconciler error	{"controller": "cert-rotator", "object": {"name":"kedaorg-certs","namespace":"keda"}, "namespace": "keda", "name": "kedaorg-certs", "reconcileID": "8e7e9cfa-5bb0-45ec-8526-2b36e307ecbd", "error": "Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"keda-admission\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227

@JorTurFer
Copy link
Member

Is the message transient or you see it permantently in your logs?
Are you deploying KEDA with any kind of auto-sync? If yes, the auto-sync tool can be conflicting with cert-controller (the internal controller that KEDA uses for managing internally the certificates). If this is your case, I'd suggest disabling the aout-sync to check if the error disapears and if yes, suing cert-manager to manage certificates externally instead of using the internal controller

@monotek
Copy link
Author

monotek commented Nov 30, 2023

Yes, the error message are popping up regulary (around every 5 min).

What do you mena with auto-sync?
The resources created/applied by fluxcd are reconciled but certs are not part of it.

I'll try to use cert-manager.

@monotek
Copy link
Author

monotek commented Nov 30, 2023

Using cert-manager to create and inject the cert makes no difference :(

@JorTurFer
Copy link
Member

In my experience with ArgoCD, that error is because flux is reconciling the configuration all the time, locking the resource.

Using cert-manager to create and inject the cert makes no difference :(

What do you mean? it's not possible because if you use cert-manager, you have to disable this mechanism from the operator (helm chart does it automatically). https://keda.sh/docs/2.12/operate/security/#use-your-own-tls-certificates

@monotek
Copy link
Author

monotek commented Dec 7, 2023

After discussing this in the fluxcd slack channel we decided to go without the admission controller: https://cloud-native.slack.com/archives/CLAJ40HV3/p1701698388439549

@JorTurFer
Copy link
Member

I've posted on the channel too, let's see if there is something that we can do in the future to prevent this 🤞

Copy link

stale bot commented Feb 6, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Feb 6, 2024
Copy link

stale bot commented Feb 13, 2024

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Feb 13, 2024
@github-project-automation github-project-automation bot moved this from To Triage to Ready To Ship in Roadmap - KEDA Core Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity
Projects
Archived in project
Development

No branches or pull requests

2 participants