Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat) introduce maxConsecutiveFailures knob #1025

Merged
merged 1 commit into from
Feb 17, 2025

Conversation

gianlucam76
Copy link
Member

@gianlucam76 gianlucam76 commented Feb 13, 2025

The maxConsecutiveFailures option allows control over deployment retry behavior. This optional field defines a threshold for consecutive deployment failures. After the specified number of consecutive failures, Sveltos will stop retrying the deployment.
Retries will only resume if the profile configuration is updated.

If maxConsecutiveFailures is not configured, Sveltos will retry indefinitely

This PR also solves two errors seen few times.

The first error as Helm stuck with: "cannot re-use a name that is still in use"
This condition should never occur. A previous check ensures that only one
ClusterProfile/Profile can manage a Helm Chart with a given name in a
specific namespace within a managed cluster. So after an install such error
is seen, Sveltos switches to an upgrade.

The second error as Helm stck with: "another operation (install/upgrade/rollback) is in progress"
As described above this condition should never happen. Sveltos tries to recover
by doing first an uninstall and then install back. While this is not ideal, there is no
other way to recover from this condition without manual intervention.

@gianlucam76 gianlucam76 force-pushed the max-retries branch 6 times, most recently from 66f85ed to abb5705 Compare February 13, 2025 18:35
The maxConsecutiveFailures option allows control over deployment retry behavior.
This optional field defines a threshold for consecutive deployment failures.
After the specified number of consecutive failures, Sveltos will stop retrying
the deployment.
Retries will only resume if the profile configuration is updated.

If maxConsecutiveFailures is not configured, Sveltos will retry indefinitely

This PR also solves two errors seen few times.

The first error as Helm stuck with:  "cannot re-use a name that is still in use"
This condition should never occur.  A previous check ensures that only one
ClusterProfile/Profile can manage a Helm Chart with a given name in a
specific namespace within a managed cluster. So after an install such error
is seen, Sveltos switches to an upgrade.

The second error as Helm stck with: "another operation (install/upgrade/rollback) is in progress"
As described above this condition should never happen. Sveltos tries to recover
by doing first an uninstall and then install back. While this is not ideal, there is no
other way to recover from this condition without manual intervention.
@gianlucam76 gianlucam76 merged commit 6f8fc4d into projectsveltos:main Feb 17, 2025
12 checks passed
@gianlucam76 gianlucam76 deleted the max-retries branch February 17, 2025 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant