Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release-1.31] - rke2 start fails after rke2-killall.sh execution #7156

Closed
brandond opened this issue Oct 30, 2024 · 1 comment
Closed

[Release-1.31] - rke2 start fails after rke2-killall.sh execution #7156

brandond opened this issue Oct 30, 2024 · 1 comment
Assignees

Comments

@brandond
Copy link
Member

Backport fix for rke2 start fails after rke2-killall.sh execution

@endawkins
Copy link

endawkins commented Dec 4, 2024

Validated on release-1.31 with f1db1f8 / v1.31.3

Environment Details

Infrastructure

  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

Linux ip-172-31-7-110 6.8.0-1012-aws #13-Ubuntu SMP Mon Jul 15 13:40:27 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 24.04 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo

Cluster Configuration:

1 server

Config.yaml:

write-kubeconfig-mode: 644
token: test
node-external-ip: 

Additional files

N/A

Testing Steps

  1. Copy config.yaml
$ sudo mkdir -p /etc/rancher/rke2 && sudo cp config.yaml /etc/rancher/rke2
  1. Install RKE2
  2. Mount server and agent dirs
$ sudo mount --bind /var/lib/rancher/rke2/server /var/lib/rancher/rke2/server; sudo mount --bind /var/lib/rancher/rke2/agent /var/lib/rancher/rke2/agent
  1. Run killall
  2. Verify server and agent directories are still present after killall
  3. Restart rke2

Replication Results:
Unable to reproduce the issue -- providing validation below
Ran the same steps on a released version:

/usr/local/bin/rke2 -v
rke2 version v1.31.2+rke2r1 (dc4219f5755bb1deb91619550b3565892b57ecdb)
go version go1.22.8 X:boringcrypto

Validation Results:

  • rke2 version used for validation:
/usr/local/bin/rke2 -v
rke2 version v1.31.3-rc3+rke2r1 (f1db1f8266ab7315ff447c8acdaefa2ba16b87c0)
go version go1.22.8 X:boringcrypto
cat /proc/mounts | grep /var/lib/rancher/rke2/agent; cat /proc/mounts | grep /var/lib/rancher/rke2/server
/dev/root /var/lib/rancher/rke2/agent ext4 rw,relatime,discard,errors=remount-ro 0 0
/dev/root /var/lib/rancher/rke2/server ext4 rw,relatime,discard,errors=remount-ro 0 0
$ kubectl get nodes, pods -A -o wide

NAME                                              STATUS   ROLES                       AGE   VERSION          INTERNAL-IP    EXTERNAL-IP      OS-IMAGE                              KERNEL-VERSION                 CONTAINER-RUNTIME
node/ip-172-31-10-24.us-east-2.compute.internal   Ready    control-plane,etcd,master   24m   v1.31.3+rke2r1   172.31.10.24   [REDACTED]       SUSE Linux Enterprise Server 15 SP5   5.14.21-150500.55.44-default   containerd://1.7.23-k3s2

NAMESPACE     NAME                                                                      READY   STATUS      RESTARTS        AGE     IP             NODE                                         NOMINATED NODE   READINESS GATES
kube-system   pod/cloud-controller-manager-ip-172-31-10-24.us-east-2.compute.internal   1/1     Running     1 (10m ago)     24m     172.31.10.24   ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/etcd-ip-172-31-10-24.us-east-2.compute.internal                       1/1     Running     0               9m40s   172.31.10.24   ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/helm-install-rke2-canal-vlwjj                                         0/1     Completed   0               24m     172.31.10.24   ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/helm-install-rke2-coredns-9tlzh                                       0/1     Completed   0               24m     172.31.10.24   ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/helm-install-rke2-ingress-nginx-6c7bj                                 0/1     Completed   0               24m     <none>         ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/helm-install-rke2-metrics-server-gpdnc                                0/1     Completed   0               24m     <none>         ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/helm-install-rke2-snapshot-controller-crd-xsdsk                       0/1     Completed   0               24m     <none>         ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/helm-install-rke2-snapshot-controller-rnb5h                           0/1     Completed   0               24m     <none>         ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/helm-install-rke2-snapshot-validation-webhook-w79tn                   0/1     Completed   0               24m     <none>         ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/kube-apiserver-ip-172-31-10-24.us-east-2.compute.internal             1/1     Running     1               24m     172.31.10.24   ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/kube-controller-manager-ip-172-31-10-24.us-east-2.compute.internal    1/1     Running     1 (10m ago)     24m     172.31.10.24   ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/kube-proxy-ip-172-31-10-24.us-east-2.compute.internal                 1/1     Running     1 (10m ago)     24m     172.31.10.24   ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/kube-scheduler-ip-172-31-10-24.us-east-2.compute.internal             1/1     Running     1 (10m ago)     24m     172.31.10.24   ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/rke2-canal-gphl7                                                      2/2     Running     2 (10m ago)     24m     172.31.10.24   ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/rke2-coredns-rke2-coredns-9579797d8-6fdbx                             1/1     Running     1 (10m ago)     24m     10.42.0.3      ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/rke2-coredns-rke2-coredns-autoscaler-78db5d674-hjxn9                  1/1     Running     1 (10m ago)     24m     10.42.0.2      ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/rke2-ingress-nginx-controller-w62bm                                   1/1     Running     1 (10m ago)     22m     10.42.0.4      ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/rke2-metrics-server-7c85d458bd-7ltj2                                  1/1     Running     2 (9m4s ago)    23m     10.42.0.6      ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/rke2-snapshot-controller-65bc6fbd57-h7f4r                             1/1     Running     3 (8m40s ago)   23m     10.42.0.5      ip-172-31-10-24.us-east-2.compute.internal   <none>           <none>
kube-system   pod/rke2-snapshot-validation-webhook-859c7896df-lmmv2                     1/1     Running     3 (8m40s ago)   23m     10.42.0.7      ip-172-31-10-24.us-east-2.compute.internal   <none>
● rke2-server.service - Rancher Kubernetes Engine v2 (server)
     Loaded: loaded (/usr/local/lib/systemd/system/rke2-server.service; enabled; vendor preset: disabled)
     Active: active (running) since Tue 2024-12-03 22:51:59 UTC; 3min 9s ago
       Docs: https://github.com/rancher/rke2#readme
    Process: 28257 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
    Process: 28259 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 28260 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
   Main PID: 28261 (rke2)
      Tasks: 200

Additional context / logs:
N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants