Can not start ebs-plugin container in ebs-csi-node daemonset properly #494

nomatterz · 2020-05-04T18:30:11Z

/kind bug

What happened?

I've tried to install driver according to official EKS docs. Just created EKS cluster and workers, attached IAM role to node and applied kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=master"

Here is what i get

Events:
  Type     Reason     Age                    From                                               Message
  ----     ------     ----                   ----                                               -------
  Normal   Scheduled  10m                    default-scheduler                                  Successfully assigned kube-system/ebs-csi-node-kgk5z to ip-10-1-3-226.eu-west-1.compute.internal
  Normal   Started    10m                    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Started container liveness-probe
  Normal   Created    10m                    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Created container liveness-probe
  Normal   Pulled     10m                    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Container image "quay.io/k8scsi/livenessprobe:v1.1.0" already present on machine
  Normal   Pulled     10m                    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Container image "quay.io/k8scsi/csi-node-driver-registrar:v1.1.0" already present on machine
  Normal   Created    10m                    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Created container node-driver-registrar
  Normal   Started    10m                    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Started container node-driver-registrar
  Normal   Started    9m47s (x2 over 10m)    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Started container ebs-plugin
  Normal   Created    9m47s (x2 over 10m)    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Created container ebs-plugin
  Normal   Killing    8m48s (x2 over 9m48s)  kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Container ebs-plugin failed liveness probe, will be restarted
  Normal   Pulled     8m47s (x3 over 10m)    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Container image "amazon/aws-ebs-csi-driver:v0.4.0" already present on machine
  Warning  Unhealthy  5m28s (x26 over 10m)   kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Liveness probe failed: Get http://10.1.3.226:9808/healthz: dial tcp 10.1.3.226:9808: connect: connection refused
  Warning  BackOff    25s (x19 over 4m48s)   kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Back-off restarting failed container

I've disabled liveness probe to get into container to troubleshoot. Here is what i found:
For some unknown reason /bin/aws-ebs-csi-driver is trying to use socket /tmp/csi.sock but not the /csi/csi.sock
I've noticed that argument node has impact on this
with node:

bash-4.2# aws-ebs-csi-driver node --endpoint unix:/csi/csi.sock
I0504 17:56:17.806082     152 driver.go:43] Driver: ebs.csi.aws.com Version: 0.4.0
I0504 17:56:17.818019     152 driver.go:79] Listening for connections on address: &net.UnixAddr{Name:"/tmp/csi.sock", Net:"unix"}
---
bash-4.2# curl localhost:9808/healtz 
curl: (7) Failed to connect to localhost port 9808: Connection refused

without node:

bash-4.2# aws-ebs-csi-driver --endpoint unix:/csi/csi.sock
I0504 17:56:34.741502     159 driver.go:43] Driver: ebs.csi.aws.com Version: 0.4.0
I0504 17:56:34.746955     159 driver.go:79] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
---
bash-4.2# curl http://localhost:9808/healthz
ok

I don't know what this argument brings and how to fix this behaviour properly

What you expected to happen?

Just apply kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=master" and get everything working

How to reproduce it (as minimally and precisely as possible)?
Described above

Anything else we need to know?:
Nevertheless container seem to start with proper arguments

bash-4.2# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 18:03 ?        00:00:00 /bin/aws-ebs-csi-driver node --endpoint=unix:/csi/csi.sock --logtostderr --v=5
root        12     0  0 18:19 pts/0    00:00:00 bash
root        29    12  0 18:22 pts/0    00:00:00 ps -ef
bash-4.2# ls -la /csi/
total 0
drwxr-xr-x 2 root root  6 May  4 18:03 .
drwxr-xr-x 1 root root 72 May  4 18:03 ..
bash-4.2# ls -la /tmp/
total 0
drwxrwxrwt 1 root root 22 May  4 18:22 .
drwxr-xr-x 1 root root 72 May  4 18:03 ..
drwxrwxrwt 2 root root  6 Aug 24  2019 .ICE-unix
drwxrwxrwt 2 root root  6 Aug 24  2019 .Test-unix
drwxrwxrwt 2 root root  6 Aug 24  2019 .X11-unix
drwxrwxrwt 2 root root  6 Aug 24  2019 .XIM-unix
drwxrwxrwt 2 root root  6 Aug 24  2019 .font-unix
srwxr-xr-x 1 root root  0 May  4 18:03 csi.sock

Environment

kubectl version
Client Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.10-eks-bac369", GitCommit:"bac3690554985327ae4d13e42169e8b1c2f37226", GitTreeState:"clean", BuildDate:"2020-02-21T23:37:18Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-eks-af3caf", GitCommit:"af3caf6136cd355f467083651cc1010a499f59b1", GitTreeState:"clean", BuildDate:"2020-03-27T21:51:36Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}

Driver version:
0.4.0

The text was updated successfully, but these errors were encountered:

wongma7 · 2020-05-04T19:31:52Z

I am pretty sure this is because this PR was just merged https://github.com/kubernetes-sigs/aws-ebs-csi-driver/pull/475/files#diff-0376647074b271462315f3158fefb07dR44 . It adds the node argument to entrypoint aws-ebs-csi-driver, but the 0.4.0 version of aws-ebs-csi-driver doesn't recognize this argument.

Quick fix should be to remove the node argument and add it back after 0.5.0 is released. Then the template will work whether stable/0.5.0 or alpha/latest is chosen.

wongma7 · 2020-05-04T19:40:29Z

Actually 0.5.0 was already released and pushed so we should bump the stable overlay.

nomatterz · 2020-05-05T09:25:55Z

@wongma7
thank you for prompt reply!
I confirm that issue is gone with current state of master on both EKS 1.15 and 1.16

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 4, 2020

wongma7 mentioned this issue May 4, 2020

Update stable overlay to 0.5.0 #495

Merged

k8s-ci-robot closed this as completed in #495 May 4, 2020

rlabrecque mentioned this issue Dec 8, 2020

Dynamic EBS storage "no volume plugin found" k3s-io/k3s#1037

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not start ebs-plugin container in ebs-csi-node daemonset properly #494

Can not start ebs-plugin container in ebs-csi-node daemonset properly #494

nomatterz commented May 4, 2020 •

edited

Loading

wongma7 commented May 4, 2020

wongma7 commented May 4, 2020

nomatterz commented May 5, 2020

Can not start ebs-plugin container in ebs-csi-node daemonset properly #494

Can not start ebs-plugin container in ebs-csi-node daemonset properly #494

Comments

nomatterz commented May 4, 2020 • edited Loading

wongma7 commented May 4, 2020

wongma7 commented May 4, 2020

nomatterz commented May 5, 2020

nomatterz commented May 4, 2020 •

edited

Loading