-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.8.7 pre-start script fails after rollback from 0.8.7 to 0.8.6 #87
Comments
By default, k3s uses the hostname as the node name (aka node identifier). However, bosh assign the agent_id as hostname, see cloudfoundry/bosh-dns-release#30. The agent_id changes each time a new vm is created. Therefore, in order to support bosh recreation of instances, the k3s wrapper release sets a stable predictable k8s node name in k3s-wrapper-boshrelease/jobs/k3s-agent/templates/bin/ctl.erb Lines 25 to 31 in 2f23e55
This leverages the K3S_NODE_NAME environment variable in k3s:
The host passwords hash stored in the k8s api can be inspected with k get -n kube-system secrets edge-agents-r1-z2-3.node-password.k3s -o "jsonpath={.data['hash']}" | base64 -d
#>$1:9cd6c33604e7df7c:15:8:1:PRTNHSo8kopiYCFpHb0wClsK47rnDiH2mnRqnWU6yx0ym3PTdLKuX8vJJB0Y+J1Swg0DFgQcNf0xkj1J5NeA8A
# compare with content of /etc/rancher/node/password fetch from bosh ssh
93c9d53f362229674c341014fdee4b5b The function creating the secret hash from node password is available at https://github.com/k3s-io/k3s/blob/f345697c0a65c5c427817ea60b712a27c6c159d9/pkg/nodepassword/nodepassword.go#L70-L88 The function for verifying node password from hash is available at the following golang sources, but with no documented corresponding CLI commands for external verification https://github.com/k3s-io/k3s/blob/f345697c0a65c5c427817ea60b712a27c6c159d9/pkg/nodepassword/nodepassword.go#L54-L67 |
Given a bosh deployment running a k3s-wrapper 0.8.6 with K3S_DATA_DIR set to /var/vcap/store/k3s-xx
And bosh deploy operation upgrading to 0.8.7 which fails
And bosh cck which restores the last known version 0.8.6 (*)
And a new bosh deploy upgrades to 0.8.7
Then we observe the pre-start script failing with message with 0.8.7
cat pre-start.stderr.log #> mv: cannot move '/var/vcap/store/k3s-agent' to '/var/vcap/store/k3s-datadir': Directory not empty
This is because the start with version 0.8.6 (*) has created a new '/var/vcap/store/k3s-agent' directory
Note: During this process, k3s would consider installing a fresh node, and would overide the '/etc/rancher/node/password'
As a result, even after fixing the pre-start issue, the new node password that would prevent joining other existing server nodes with error
time="2025-01-06T11:53:24Z" level=info msg="Waiting to retrieve agent configuration; server is not ready: /var/vcap/store/k3s-datadir/agent/serving-kubelet.crt: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag"
The workaround is then to delete the k8s node (e.g.
kubectl delete node/edge-agents-r1-z2-0
)See https://docs.k3s.io/architecture#how-agent-node-registration-works
The text was updated successfully, but these errors were encountered: