-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow marking releases stuck in a pending state as failed #16
Conversation
@@ -755,6 +777,35 @@ func (r *Reconciler) doUpgrade(actionClient helmclient.ActionInterface, u *updat | |||
return rel, nil | |||
} | |||
|
|||
func (r *Reconciler) handlePending(actionClient helmclient.ActionInterface, rel *release.Release, u *updater.Updater, log logr.Logger) (ctrl.Result, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a new test case for a reconciliation with a MarkFailedAfter
which tests the transisition from a pending to a failed release (and maybe even the roll-forward)?
As these changes affect the core logic of our reconciliations it is important from my point of view.
} | ||
u.UpdateStatus( | ||
updater.EnsureCondition(conditions.Irreconcilable(corev1.ConditionTrue, conditions.ReasonPendingError, err))) | ||
return ctrl.Result{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think adding an interval
to set the RequeueAfter
property is useful here?
My fear is that the reconciliations are really fast in production systems and could create bigger loads on the API server which in turn leads to a slower API server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline: the operator already uses a builtin rate-limiter to prevent overloading the API server. The situation we are handling here should be rare, and we expect that the primary cause are people running manual Helm operations. For these, the pending state will only last for ~10s at most, and we don't want to block the operator for the next 2m if they encounter such a state.
Note: with dependent watches, this might be less of an issue, but we currently don't have those.
@@ -133,6 +140,11 @@ func (c *ActionClient) Upgrade(name, namespace string, chrt *chart.Chart, vals m | |||
return c.HandleUpgrade() | |||
} | |||
|
|||
func (c *ActionClient) MarkFailed(rel *release.Release, reason string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the idea to mark the release as failed and being reconciled by the existing logic. Nice! 💯
This PR adds an option
WithMarkFailedAfter(duration)
which allows marking a release seemingly stuck in a pending state (pending-install
,pending-upgrade
,pending-rollback
) as "failed" after a given timeout (measured from the "last deployed" timestamp). The "failed" state will allow the next "upgrade" operation to succeed.