Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-controller non-leader replica left in not-ready state #837

Open
gladiatr72 opened this issue Jul 18, 2022 · 5 comments
Open

source-controller non-leader replica left in not-ready state #837

gladiatr72 opened this issue Jul 18, 2022 · 5 comments

Comments

@gladiatr72
Copy link

in reference to: #326

Ok, but source-controller is the only flux component that actually does this. I thought that was the whole purpose of the leader election thing...

Personally, I don't care about the prom alerts. Those can be silenced. Filling up the event log with reams of flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-579b5cc8c9-2mvw8 -- Readiness probe failed: Get "http://10.64.183.82:9090/": dial tcp 10.64.183.82:9090: connect:; however, is not. (particularly on managed clusters where event retention is non-configurable (lifespan of a mayfly in a hurricane))

⇶ flux check 2>&1 | grep -v ready
► checking prerequisites
✗ flux 0.31.2 <0.31.4 (new version is available, please upgrade)
✔ Kubernetes 1.22.9-eks-a64ea69 >=1.20.6-0
► checking controllers
► ghcr.io/fluxcd/helm-controller:v0.22.1
► ghcr.io/fluxcd/image-automation-controller:v0.23.4
► ghcr.io/fluxcd/image-reflector-controller:v0.19.2
► ghcr.io/fluxcd/kustomize-controller:v0.26.1
► ghcr.io/fluxcd/notification-controller:v0.24.0
► ghcr.io/fluxcd/source-controller:v0.25.8
⇶ k get pod
NAME                                       READY   STATUS    RESTARTS   AGE
charts-7754c6d999-9vwq6                    1/1     Running   0          2d19h
charts-7754c6d999-tkxhv                    1/1     Running   0          22h
helm-controller-6c784449bf-dxg6n           1/1     Running   0          12m
helm-controller-6c784449bf-mcjk4           1/1     Running   0          12m
kustomize-controller-6cdbcbf75f-d5jgq      1/1     Running   0          12m
kustomize-controller-6cdbcbf75f-m5wqx      1/1     Running   0          12m
notification-controller-6b46b479f8-gdgw6   1/1     Running   0          12m
notification-controller-6b46b479f8-rf84v   1/1     Running   0          12m
source-controller-579b5cc8c9-s2wwz         1/1     Running   0          20m

helm-controller initial logs:

dxg6n

{"level":"info","ts":"2022-07-18T17:25:20.136Z","logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":"2022-07-18T17:25:20.137Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2022-07-18T17:25:20.138Z","msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:8080"}
{"level":"info","ts":"2022-07-18T17:25:20.139Z","msg":"Starting server","kind":"health probe","addr":"[::]:9440"}
I0718 17:25:20.239472       7 leaderelection.go:248] attempting to acquire leader lease flux-system/helm-controller-leader-election...

mcjk4

{"level":"info","ts":"2022-07-18T17:25:16.734Z","logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":"2022-07-18T17:25:16.735Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2022-07-18T17:25:16.735Z","msg":"Starting server","kind":"health probe","addr":"[::]:9440"}
{"level":"info","ts":"2022-07-18T17:25:16.736Z","msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:8080"}
I0718 17:25:16.836564       7 leaderelection.go:248] attempting to acquire leader lease flux-system/helm-controller-leader-election...
I0718 17:25:22.253167       7 leaderelection.go:258] successfully acquired lease flux-system/helm-controller-leader-election
@gladiatr72
Copy link
Author

crickets...

@pjbgf
Copy link
Member

pjbgf commented Aug 25, 2022

Hey @gladiatr72, have you noticed anything else in the logs in terms of errors?

My first guess would be to check the resource limits (CPU and Memory) and resource saturation, as I have seen they causing the controller to misbehave without a proper reason - or reasonable error messages.
Are you using the defaults values?

Would you also be able to share what type of sources (and quantity) you have configured in your setup?

@gladiatr72
Copy link
Author

gladiatr72 commented Aug 25, 2022

Sure

⇶ k get deployments.apps -n flux-system source-controller -o jsonpath={.spec.template.spec.containers[0].resources} | jq -M .

{
  "limits": {
    "cpu": "1",
    "memory": "1Gi"
  },
  "requests": {
    "cpu": "50m",
    "memory": "64Mi"
  }
}

The controller is not misbehaving. From the referenced ticket it is working as intended.

⇶ k get deployments.apps
NAME                          READY   UP-TO-DATE   AVAILABLE   AGE
charts                        2/2     2            2           153d
helm-controller               2/2     2            2           154d
kustomize-controller          2/2     2            2           154d
notification-controller       2/2     2            2           154d
source-controller             1/2     2            1           154d

flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused

The issue is in regards to the fluxv2 source controller being the only fluxv2 controller that uses a failed

flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused

readiness check to manage

flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused
flux-system -- 0s -- Warning -- Unhealthy -- pod/source-controller-84dcf54c8-xlq4h -- Readiness probe failed: Get "http://10.64.177.148:9090/": dial tcp 10.64.177.148:9090: connect: connection refused

leadership; whereas the kustomize, helm and notification controllers' secondary pods con ready, play a quick leadership game then slip into a passive state until the next election.

I understand why those bits might not have been added to the source-controller yet, but #326 leaves one with the impression that it is not being considered as a thing that needs doing.

@gladiatr72
Copy link
Author

The other part of what you asked for: current environment has 3 git sources and 1 helm repository with ~2 dozen charts

@stefanprodan
Copy link
Member

If we would allow for standby pods to become ready, then consumers like kustomize and helm controller will not be able to fetch the source artifacts, standby pods don’t replicate the storage from the primary but kube proxy will randomly route calls to them.

jonasbadstuebner added a commit to jonasbadstuebner/fluxcd-community-helm-charts that referenced this issue Dec 14, 2023
jonasbadstuebner added a commit to jonasbadstuebner/fluxcd-community-helm-charts that referenced this issue Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants