summarize: Consider obj status condition in result #703

darkowlzz · 2022-04-29T16:54:07Z

SummarizeAndPatch() should also consider the object's status conditions
when computing and returning the runtime results to avoid any
inconsistency in the runtime result and status condition of the object.
When an object's Ready conditions are False, the reconciler should retry
unless it's in stalled state.

Detailed description

Summarize and patch takes the abstracted result and error, and computes the runtime result from it. Separately, it takes the object with status conditions and calculates the status summary and patches the object status.
This can result in a situation where the (sub)reconcilers return a successful result without any error, resulting in a success runtime result. But a negative status condition on the object set by one of the sub-reconcilers would result in the Ready status condition to be False. As a result of this, an object would have Ready=False status and the reconciler wouldn't be attempting any retry to reconcile it. We expect this behavior to only happen when the status condition Stalled=True. Unless Ready=True, the reconciler should always retry.

To make the model consistent, the summarize and patch process can take into consideration the status condition summary for Ready condition.

Runtime result behavior with respect to Ready condition:

If Ready=True, but the abstracted result is ResultRequeue and nil error, the runtime result should have Requeue: true.
If Ready=True, but the abstracted result is ResultSuccess and nil error, the runtime result should have RequeueAfter: <interval>.
If Ready=True, but the reconciler error is not nil, a runtime error is returned, resulting in runtime requeuing the object for reconciliation. Since an error in itself can't be used to set the Ready=False with reason and message, we can't change the status condition to reflect this failure. The retry will happen in the background without anything visible in the object status.
[New behavior] If Ready=False, but the abstracted result is Success and reconciler error is nil, the reconciler error can be changed to reflect the reason for Ready=False by setting the error to be the Ready condition message. This will result in the object to have Ready=False and the reconciler performing a retry.

darkowlzz · 2022-04-29T17:00:21Z

internal/reconcile/reconcile_test.go

-			afterFunc: func(t *WithT, obj conditions.Setter, patchOpts *patch.HelperOptions) {
-				t.Expect(patchOpts.IncludeStatusObservedGeneration).To(BeFalse())
-			},
-		},


Duplicate test case.

SummarizeAndPatch() should also consider the object's status conditions when computing and returning the runtime results to avoid any inconsistency in the runtime result and status condition of the object. When an object's Ready condition is False, the reconciler should retry unless it's in stalled condition. Signed-off-by: Sunny <[email protected]>

stefanprodan

LGTM

Thanks @darkowlzz 🏅

pjbgf

Really good stuff @darkowlzz

There are several FluxCD issues which were attempted to be fixed by implementing recovery logic into app framework. A new issue was observed that may be hit more often because of the recovery logic. Flux won't attempt to finish updating a HelmRelease resource to Ready=True but only display a message containing this signature: '''the object has been modified...'''. The resource must be correctly updated by Flux. Aside from this the recovery logic was put in place for HelmChart resources that are in Ready=False state, but this should have been handled by Flux. I believe the 2 issues described above map to [1] and [2]. To pull [1] and [2] updated FluxCD release to latest available [3]. Release is composed of manifests and container image used. Update containers used: helm-controller from v0.15.0 to v0.27.0, source-controller from v0.20.1 to v0.32.1. Update flux crds, rbac, deployment. Changelog claims we get k8s 1.25 support part of the upversion. Disclaimer for tests: 1) Recovery logic concerned with flipping spec.suspend was removed. Introduced by ongoing [4] 2) optimization by flipping spec.suspend was removed Introduced by ongoing[4] 3) cert-manager, nginx-ingress-controller, platform-integ-apps had the reconciliation interval decreased to 1m to allow Flux to manage the resources by itself in a reasonable time interval. There will be future commits per app updating reconciliation interval. Tests on AIO-SX: PASS: bootstrap PASS: unlocked enabled available PASS: apps applied PASS: tested on k8s 1.24.4 and 1.21.8, inherit that versions between should be OK PASS: delete pods and see they are recreated and no errors PASS: inspect flux pod logs for errors PASS: re-test known trigger for 1996747 and 1995748 [1]: fluxcd/source-controller#703 [2]: fluxcd/source-controller#202 [3]: https://github.com/fluxcd/flux2/releases/tag/v0.37.0 [4]: https://review.opendev.org/c/starlingx/config/+/866862 Related-Bug: 1995748 Related-Bug: 1996747 Closes-Bug: 1999032 Signed-off-by: Dan Voiculeasa <[email protected]> Change-Id: Id295599a2946f48081e7d27e2ab8e06063c3c88d

darkowlzz added the enhancement New feature or request label Apr 29, 2022

darkowlzz force-pushed the target-condition-in-result branch 2 times, most recently from 80a0261 to f7dd60a Compare April 29, 2022 16:59

darkowlzz commented Apr 29, 2022

View reviewed changes

darkowlzz force-pushed the target-condition-in-result branch from f7dd60a to d64def3 Compare April 30, 2022 04:38

darkowlzz force-pushed the target-condition-in-result branch from d64def3 to 2240106 Compare April 30, 2022 04:42

stefanprodan approved these changes Apr 30, 2022

View reviewed changes

pjbgf approved these changes May 3, 2022

View reviewed changes

stefanprodan merged commit 4e30a4f into main May 3, 2022

stefanprodan deleted the target-condition-in-result branch May 3, 2022 10:26

pjbgf added this to the GA milestone May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

summarize: Consider obj status condition in result #703

summarize: Consider obj status condition in result #703

darkowlzz commented Apr 29, 2022 •

edited

Loading

darkowlzz Apr 29, 2022 •

edited

Loading

stefanprodan left a comment

pjbgf left a comment

summarize: Consider obj status condition in result #703

summarize: Consider obj status condition in result #703

Conversation

darkowlzz commented Apr 29, 2022 • edited Loading

Detailed description

darkowlzz Apr 29, 2022 • edited Loading

Choose a reason for hiding this comment

stefanprodan left a comment

Choose a reason for hiding this comment

pjbgf left a comment

Choose a reason for hiding this comment

darkowlzz commented Apr 29, 2022 •

edited

Loading

darkowlzz Apr 29, 2022 •

edited

Loading