-
Notifications
You must be signed in to change notification settings - Fork 626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade from Flux 2.1.x to 2.2.2 leaves most HelmReleases in a broken state #4524
Comments
Let me know If I can provide some more information. I can easily recreate this behavior on my local cluster consistently. |
I can confirm that this also happened to me even with the 2.2.2 release. |
This might be related: |
This is probably related:
the thing is, my helmrelease has apiVersion
Not sure why it says distribution-2.1.0, because I have:
After reverting to flux 2.1.0, everything works again. |
@razvanphp Your issue isn't related perse. The error you see is, because your controller in the cluster is still at 2.1.0 and your CLI has been updated to 2.2.2.
|
Can you please post here the |
Indeed, thank you for your answer! Sorry for the noob question... |
@wilmardo can you please provide some detailed instructions to reproduce this issue? Based on your issue description, I tried a few things but couldn't reproduce it. Some detailed steps with example configuration or even a test repository with just the necessary configurations to help reproduce it would be very helpful. |
Yes! Will get back to this, busy with other thing at the moment and we postponed this update for now. @darkowlzz Will try to get something together but I don't know if it might be something very specific to our in-house stuff that is triggering this. |
This might be related although the issue is a bit vague: |
Hi, we got another report of a similar issue today on slack and that revealed some helpful hints to the issue. I created a potential theory for what's causing this and some potential solutions for it. Refer fluxcd/helm-controller#884 and fluxcd/helm-controller#885 for details about it. I can briefly explain the observations here too. The "dependency is not ready" may not be the actual issue here. It's more likely that the reconciliation failed once with this error and on a subsequent reconciliation it went past the dependency check but the old Ready status persisted on the object and reconciliation entered a drift detection and correction loop due to some other controller/entity in the cluster which reverted/modified the configurations applied by the helmrelease. fluxcd/helm-controller#855 is an example of this situation and how it can be handled using drift detection ignore rules. Refer https://fluxcd.io/flux/components/helm/helmreleases/#drift-detection for detailed docs. Another way to verify the issue would be to look at the events and logs associated with the HelmRelease. They should mention about the drift. Debug level logs must be enabled to see the details about the detected drift, as described in the docs. I've shared some more details about my attempts to reproduce this issue in fluxcd/helm-controller#885 (comment). Based on that, I think the changes in fluxcd/helm-controller#885 should make the situation better and surface the actual issue. It would be great if people who are facing this issue can try the preview image of that PR using
It's an official preview image built using the flux release infrastructure, refer https://github.com/fluxcd/helm-controller/actions/runs/7762775568/job/21173786393. The preview image can help surface the actual underlying issue. Once the drift issue is resolved, the helm-controller can be reverted to the previous version as that works fine, just the status reporting made it confusing. |
Hi, Flux v2.2.3 has been released with fluxcd/helm-controller#884 to help with the issue reported here. Instead of the test image I shared in the last comment, please upgrade to Flux v2.2.3 and see if it helps surface the potential drift detection and correction issue as described in detail above. The status wouldn't mention about drift explicitly yet but will show that the HelmRelease is being processed, not in a failed state. Please check the events of the particular HelmRelease and the logs, as documented in https://fluxcd.io/flux/components/helm/helmreleases/#drift-detection, to see if they have conflict in drift correction that's causing the release to not complete successfully. In a future release, we may add explicit message about drift correction as described in fluxcd/helm-controller#885 . |
Describe the bug
It seems that after the upgrade some HelmReleases are migrated to the newer API object but some aren't. This won't go away unless the HelmReleases is removed or
reconcile --force
is used.Most obvious thing is that the message in
flux get hr
doesn't show the new information. The more breaking thing is that dependencies aren't considered ready even when dependency is Ready and the message showsHelm upgrade succeeded
(see ingress-nginx and cert-manager in the output below for example).Seems pretty similiar to what this PR is trying to solve:
fluxcd/helm-controller#850
Which should be in v2.2.2 where I am still having this issue.
flux get hr
output right after the upgrade:All the releases showing
Helm upgrade succeeded
ordependency 'flux-system/xxx' is not ready
won't go to the new message without a--force
or deletion.I tried:
v2beta2
for all the HelmReleases as described here: controller: reset field managers forv2beta1
helm-controller#850 (review)It would be extremely nice if the upgrade could be autonomous and does not require human intervention to run
reconcile --force
of all HelmReleases. The--force
will break in some occasions as well (AWS with an NLB on a Service for example).Steps to reproduce
flux get hr
with different messages and stuck dependenciesExpected behavior
All the HelmReleases to show the new message and being accepted as ready
Screenshots and recordings
No response
OS / Distro
N/A
Flux version
v2.2.2
Flux check
► checking prerequisites
✔ Kubernetes 1.27.5+k3s1 >=1.26.0-0
► checking version in cluster
✔ distribution: flux-2.2.2
✔ bootstrapped: false
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.37.2
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v1.2.1
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v1.2.3
► checking crds
✔ buckets.source.toolkit.fluxcd.io/v1beta2
✔ gitrepositories.source.toolkit.fluxcd.io/v1
✔ helmcharts.source.toolkit.fluxcd.io/v1beta2
✔ helmreleases.helm.toolkit.fluxcd.io/v2beta2
✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2
✔ kustomizations.kustomize.toolkit.fluxcd.io/v1
✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2
✔ all checks passed
Git provider
No response
Container Registry provider
No response
Additional context
Reconcile log of a 'stuck' HelmRelease:
Reconcile log of a update HelmRelease:
All seems happy in the helm-controller to me :)
Code of Conduct
The text was updated successfully, but these errors were encountered: