Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Bug) recover when token expires #337

Merged
merged 1 commit into from
Jan 23, 2025

Conversation

gianlucam76
Copy link
Member

When the sveltos-agent runs in the management cluster, it receives the managed cluster's kubeconfig from a Secret. These kubeconfigs can expire (e.g., GKE tokens have a maximum lifespan of 48 hours).

Sveltos includes a mechanism to proactively renew these tokens. The SveltosCluster controller can be configured to periodically refresh tokens before they expire, preventing disruptions.

However, prior to this change, the sveltos-agent, when deployed in the management cluster, lacked the ability to retrieve an updated kubeconfig. Consequently, upon kubeconfig expiration, the agent encountered numerous authorization errors, effectively ceasing operation.

This pull request addresses this issue by implementing a mechanism to detect kubeconfig expiration. Upon detection, the sveltos-agent retrieves a fresh, valid kubeconfig. This triggers a restart of the controller-runtime manager (and all associated controllers) as well as the evaluation manager, ensuring continued operation.

Fixes #336

When the sveltos-agent runs in the management cluster, it receives the managed cluster's
kubeconfig from a Secret. These kubeconfigs can expire (e.g., GKE tokens have a maximum
lifespan of 48 hours).

Sveltos includes a mechanism to proactively renew these tokens. The SveltosCluster controller
can be configured to periodically refresh tokens before they expire, preventing disruptions.

However, prior to this change, the sveltos-agent, when deployed in the management cluster,
lacked the ability to retrieve an updated kubeconfig. Consequently, upon kubeconfig expiration,
the agent encountered numerous authorization errors, effectively ceasing operation.

This pull request addresses this issue by implementing a mechanism to detect kubeconfig expiration.
Upon detection, the sveltos-agent retrieves a fresh, valid kubeconfig. This triggers a restart of
the controller-runtime manager (and all associated controllers) as well as the evaluation manager,
ensuring continued operation.
@gianlucam76 gianlucam76 merged commit a6ca4d9 into projectsveltos:main Jan 23, 2025
3 checks passed
@gianlucam76 gianlucam76 deleted the client branch January 23, 2025 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(bug) when agent is running within the management cluster, after a while agent stops working
1 participant