Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico-node crashed if calico v3.21.1 #8249

Closed
DANic-git opened this issue Nov 29, 2021 · 3 comments · Fixed by #8250
Closed

Calico-node crashed if calico v3.21.1 #8249

DANic-git opened this issue Nov 29, 2021 · 3 comments · Fixed by #8250
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@DANic-git
Copy link
Contributor

Environment:

  • Cloud provider or hardware configuration:
    VM based on box generic/debian10

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
    Linux 4.19.0-18-amd64 x86_64
    PRETTY_NAME="Debian GNU/Linux 10 (buster)"
    NAME="Debian GNU/Linux"
    VERSION_ID="10"
    VERSION="10 (buster)"
    VERSION_CODENAME=buster
    ID=debian
    HOME_URL="https://www.debian.org/"
    SUPPORT_URL="https://www.debian.org/support"
    BUG_REPORT_URL="https://bugs.debian.org/"

  • Version of Ansible (ansible --version):
    ansible 2.10.15

  • Version of Python (python --version):
    Python 3.9.6

Kubespray version (commit) (git rev-parse --short HEAD):
2015725

Network plugin used:
Calico

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

https://termbin.com/20bv

calico_version: 'v3.21.1'

Command used to invoke ansible:
ansible-playbook -i inventory/hosts.yaml -b cluster.yml

Output of ansible run:

Anything else do we need to know:

root@node1:~# kubectl get po -A
NAMESPACE     NAME                                       READY   STATUS              RESTARTS        AGE
kube-system   calico-kube-controllers-5bc9554869-w5z54   1/1     Running             0               9m40s
kube-system   calico-node-2lkqk                          0/1     CrashLoopBackOff    6 (3m37s ago)   10m
kube-system   calico-node-qwds8                          0/1     CrashLoopBackOff    6 (3m42s ago)   10m
kube-system   calico-node-vgk6w                          0/1     CrashLoopBackOff    6 (3m13s ago)   10m
kube-system   coredns-8474476ff8-pszdn                   0/1     ContainerCreating   0               9m20s
kube-system   dns-autoscaler-5ffdc7f89d-589fx            0/1     ContainerCreating   0               9m15s
Name:                 calico-node-2lkqk
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 node1/192.168.56.201
Start Time:           Mon, 29 Nov 2021 12:51:27 +0000
Labels:               controller-revision-hash=544b498658
                      k8s-app=calico-node
                      pod-template-generation=1
Annotations:          <none>
Status:               Running
IP:                   192.168.56.201
IPs:
  IP:           192.168.56.201
Controlled By:  DaemonSet/calico-node
Init Containers:
  upgrade-ipam:
    Container ID:  containerd://f58ea1672ff297a8092fade20e81812e89ada063beef28d065cbbbc2edb26e87
    Image:         quay.io/calico/cni:v3.21.1
    Image ID:      quay.io/calico/cni@sha256:eb518dae10e9969596f9fda4f68af32f9bd6a66f95aad810054540ebd955a09c
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/calico-ipam
      -upgrade
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 29 Nov 2021 12:51:29 +0000
      Finished:     Mon, 29 Nov 2021 12:51:30 +0000
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      KUBERNETES_NODE_NAME:        (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:  <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
    Mounts:
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/lib/cni/networks from host-local-net-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7pqrh (ro)
  install-cni:
    Container ID:  containerd://6904e7613fd7cd63665d3ff855acc26644373a22808dc8126c2bdcd3ce151567
    Image:         quay.io/calico/cni:v3.21.1
    Image ID:      quay.io/calico/cni@sha256:eb518dae10e9969596f9fda4f68af32f9bd6a66f95aad810054540ebd955a09c
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/install
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 29 Nov 2021 12:51:31 +0000
      Finished:     Mon, 29 Nov 2021 12:51:35 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      CNI_CONF_NAME:            10-calico.conflist
      UPDATE_CNI_BINARIES:      true
      CNI_NETWORK_CONFIG_FILE:  /host/etc/cni/net.d/calico.conflist.template
      SLEEP:                    false
      KUBERNETES_NODE_NAME:      (v1:spec.nodeName)
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7pqrh (ro)
  flexvol-driver:
    Container ID:   containerd://0b79b2ac84b3fc1644e0635d14cd664081717985c5b26e1145a27161e4072254
    Image:          quay.io/calico/pod2daemon-flexvol:v3.21.1
    Image ID:       quay.io/calico/pod2daemon-flexvol@sha256:190260f2ca2a6f3b8f928f34d12cf01fcfa430f4ad5a942fe500d8301e4296db
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 29 Nov 2021 12:51:47 +0000
      Finished:     Mon, 29 Nov 2021 12:51:47 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /host/driver from flexvol-driver-host (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7pqrh (ro)
Containers:
  calico-node:
    Container ID:   containerd://c067ad63727b430641093aaa380200527024a863e86583b80b65498e803994ca
    Image:          quay.io/calico/node:v3.21.1
    Image ID:       quay.io/calico/node@sha256:5317d029ff39d88fdce8aadf732f6c6073cfe2c2f9d1ddc634d8cfe56444b600
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 29 Nov 2021 13:03:13 +0000
      Finished:     Mon, 29 Nov 2021 13:03:18 +0000
    Ready:          False
    Restart Count:  7
    Limits:
      cpu:     300m
      memory:  500M
    Requests:
      cpu:      150m
      memory:   64M
    Liveness:   exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=10s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/calico-node -bird-ready -felix-ready] delay=0s timeout=10s period=10s #success=1 #failure=6
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      DATASTORE_TYPE:                         kubernetes
      WAIT_FOR_DATASTORE:                     true
      CALICO_NETWORKING_BACKEND:              <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
      CLUSTER_TYPE:                           <set to the key 'cluster_type' of config map 'calico-config'>    Optional: false
      CALICO_K8S_NODE_REF:                     (v1:spec.nodeName)
      CALICO_DISABLE_FILE_LOGGING:            true
      FELIX_DEFAULTENDPOINTTOHOSTACTION:      RETURN
      FELIX_HEALTHHOST:                       localhost
      FELIX_IPTABLESBACKEND:                  Legacy
      FELIX_IPTABLESLOCKTIMEOUTSECS:          10
      CALICO_IPV4POOL_IPIP:                   Off
      FELIX_IPV6SUPPORT:                      False
      FELIX_LOGSEVERITYSCREEN:                info
      CALICO_STARTUP_LOGLEVEL:                error
      FELIX_USAGEREPORTINGENABLED:            False
      FELIX_CHAININSERTMODE:                  Insert
      FELIX_PROMETHEUSMETRICSENABLED:         False
      FELIX_PROMETHEUSMETRICSPORT:            9091
      FELIX_PROMETHEUSGOMETRICSENABLED:       True
      FELIX_PROMETHEUSPROCESSMETRICSENABLED:  True
      NODEIP:                                  (v1:status.hostIP)
      IP_AUTODETECTION_METHOD:                can-reach=$(NODEIP)
      IP:                                     autodetect
      NODENAME:                                (v1:spec.nodeName)
      FELIX_HEALTHENABLED:                    true
      FELIX_IGNORELOOSERPF:                   False
      CALICO_MANAGE_CNI:                      true
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /var/lib/calico from var-lib-calico (rw)
      /var/log/calico/cni from cni-log-dir (ro)
      /var/run/calico from var-run-calico (rw)
      /var/run/nodeagent from policysync (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7pqrh (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  var-run-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/calico
    HostPathType:  
  var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:  
  cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:  
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:  
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  host-local-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/cni/networks
    HostPathType:  
  cni-log-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/calico/cni
    HostPathType:  
  policysync:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/nodeagent
    HostPathType:  DirectoryOrCreate
  flexvol-driver-host:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
    HostPathType:  DirectoryOrCreate
  kube-api-access-7pqrh:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  16m                 default-scheduler  Successfully assigned kube-system/calico-node-2lkqk to node1
  Normal   Pulled     16m                 kubelet            Container image "quay.io/calico/cni:v3.21.1" already present on machine
  Normal   Created    16m                 kubelet            Created container upgrade-ipam
  Normal   Started    16m                 kubelet            Started container upgrade-ipam
  Normal   Pulled     16m                 kubelet            Container image "quay.io/calico/cni:v3.21.1" already present on machine
  Normal   Created    16m                 kubelet            Created container install-cni
  Normal   Started    16m                 kubelet            Started container install-cni
  Normal   Pulling    16m                 kubelet            Pulling image "quay.io/calico/pod2daemon-flexvol:v3.21.1"
  Normal   Pulled     16m                 kubelet            Successfully pulled image "quay.io/calico/pod2daemon-flexvol:v3.21.1" in 9.854140061s
  Normal   Created    16m                 kubelet            Created container flexvol-driver
  Normal   Started    16m                 kubelet            Started container flexvol-driver
  Warning  Unhealthy  16m                 kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/nodename: no such file or directory
  Normal   Started    16m (x2 over 16m)   kubelet            Started container calico-node
  Warning  Unhealthy  15m (x4 over 16m)   kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
  Normal   Pulled     15m (x3 over 16m)   kubelet            Container image "quay.io/calico/node:v3.21.1" already present on machine
  Normal   Created    15m (x3 over 16m)   kubelet            Created container calico-node
  Warning  BackOff    82s (x68 over 15m)  kubelet            Back-off restarting failed container
@DANic-git DANic-git added the kind/bug Categorizes issue or PR as related to a bug. label Nov 29, 2021
@cristicalin
Copy link
Contributor

@germetist could you share the logs of the crashed pods? Given that we just added the hashes for 3.21.1 and the default version tested in CI is still 3.20.3 this looks like we need to adjust some of our manifests for the 3.21.x version.

@DANic-git
Copy link
Contributor Author

@germetist could you share the logs of the crashed pods? Given that we just added the hashes for 3.21.1 and the default version tested in CI is still 3.20.3 this looks like we need to adjust some of our manifests for the 3.21.x version.

2021-11-30 11:45:26.850 [INFO][36] tunnel-ip-allocator/allocateip.go 279: Assign a new tunnel address type="ipipTunnelAddress"
2021-11-30 11:45:26.850 [INFO][36] tunnel-ip-allocator/allocateip.go 355: Release any old tunnel addresses IP="" type="ipipTunnelAddress"
2021-11-30 11:45:26.868 [INFO][36] tunnel-ip-allocator/allocateip.go 366: Assign new tunnel address IP="" type="ipipTunnelAddress"
2021-11-30 11:45:26.869 [INFO][36] tunnel-ip-allocator/ipam.go 103: Auto-assign 1 ipv4, 0 ipv6 addrs for host 'node3'
2021-11-30 11:45:26.872 [ERROR][36] tunnel-ip-allocator/ipam.go 117: Error assigning IPV4 addresses: failed to look up reserved IPs: connection is unauthorized: ipreservations.crd.projectcalico.org is forbidden: User "system:serviceaccount:kube-system:calico-node" cannot list resource "ipreservations" in API group "crd.projectcalico.org" at the cluster scope
2021-11-30 11:45:26.875 [FATAL][36] tunnel-ip-allocator/allocateip.go 435: Unable to autoassign an address error=failed to look up reserved IPs: connection is unauthorized: ipreservations.crd.projectcalico.org is forbidden: User "system:serviceaccount:kube-system:calico-node" cannot list resource "ipreservations" in API group "crd.projectcalico.org" at the cluster scope type="ipipTunnelAddress"
Calico node failed to start

@Janson-Wang
Copy link

@germetist hey,Dannil,Im Janson.Im new here. I have the same problem like this:
"Error assigning IPV4 addresses: failed to look up reserved IPs: connection is unauthorized: ipreservations.crd.projectcalico.org is forbidden: User "system:serviceaccount:kube-system:calico-node" cannot list resource "ipreservations" in API group "crd.projectcalico.org" at the cluster scope"
how did u finally solve this problem? looking forward to ur reply. thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants