-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After restore fleet agent does not communicate back to fleet controller and shows error in continous delivery in Rancher 2.5.9 #164
Comments
FYI - it seems that the root cause is related to the missing fleet-agent secret after the restore and the missing service accounts in the namespace cluster-fleet-local-local-1a3d67d0a899. I could get the fleet-agent for local cluster back running with creating the missing secret manually but for the downstream clusters this seems to be a bit more complicated as the service accounts seem to be missing..? |
This issue should be backported to 2.5 rancher/rancher#33954 PRs to backport: |
Available to test with backup-operator v1.2.1-rc1. The expect result is that after backup and restore, fleet clusters should stay connect with rancher server and all clusters should be active. User should be able to use fleet after restore. |
@sowmyav27 Assigning to you right now. Depends on who you want to delegate either from red team QA or QA who has done backup testing before(@anupama2501). |
Test Environment:Rancher version: v2.5-a3b524e9d00408bf8da0e46fe5f9f127d7fddd20-head Downstream cluster type:
Downstream K8s version: v1.20.12 Testing:
|
SURE-3497
SURE-3502
We are using fleet with Rancher 2.5.9, we have created several git repos that we deploy via fleet to the local and to the downstream clusters.
Now we created a rancher-backup with the backup operator, destroyed the cluster, redeployed the cluster and followed the recovery procedure described at https://rancher.com/docs/rancher/v2.5/en/backups/migrating-rancher/
After this recovery we have realized that the gitrepo and helmchartrepo secrets were not restored (see issue #163) and we re-created this secrets manually with kubectl apply -f .yaml.
Now we had to realize that the fleet agent does not work / communicate back and complains about a missing secret fleet-system/fleet-agent-bootstrap.
We have tried to create this bootstrap secret and were somewhat successful but even with the agent working somewhat - we see it as "Cluster: local" with status red color "Modified" telling us that it does not trust the CA of the kubernetes cluster which might be the correct message as the cluster was re-deployed and we really have a new CA for the RKE2 cluster that is the basis.
Long story short - how can we restore rancher properly including all the fleet agents and the required secrets so that the agents are happy?
The text was updated successfully, but these errors were encountered: