Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not start ebs-plugin container in ebs-csi-node daemonset properly #494

Closed
nomatterz opened this issue May 4, 2020 · 3 comments · Fixed by #495
Closed

Can not start ebs-plugin container in ebs-csi-node daemonset properly #494

nomatterz opened this issue May 4, 2020 · 3 comments · Fixed by #495
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@nomatterz
Copy link

nomatterz commented May 4, 2020

/kind bug

What happened?

I've tried to install driver according to official EKS docs. Just created EKS cluster and workers, attached IAM role to node and applied kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=master"

Here is what i get

Events:
  Type     Reason     Age                    From                                               Message
  ----     ------     ----                   ----                                               -------
  Normal   Scheduled  10m                    default-scheduler                                  Successfully assigned kube-system/ebs-csi-node-kgk5z to ip-10-1-3-226.eu-west-1.compute.internal
  Normal   Started    10m                    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Started container liveness-probe
  Normal   Created    10m                    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Created container liveness-probe
  Normal   Pulled     10m                    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Container image "quay.io/k8scsi/livenessprobe:v1.1.0" already present on machine
  Normal   Pulled     10m                    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Container image "quay.io/k8scsi/csi-node-driver-registrar:v1.1.0" already present on machine
  Normal   Created    10m                    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Created container node-driver-registrar
  Normal   Started    10m                    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Started container node-driver-registrar
  Normal   Started    9m47s (x2 over 10m)    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Started container ebs-plugin
  Normal   Created    9m47s (x2 over 10m)    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Created container ebs-plugin
  Normal   Killing    8m48s (x2 over 9m48s)  kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Container ebs-plugin failed liveness probe, will be restarted
  Normal   Pulled     8m47s (x3 over 10m)    kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Container image "amazon/aws-ebs-csi-driver:v0.4.0" already present on machine
  Warning  Unhealthy  5m28s (x26 over 10m)   kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Liveness probe failed: Get http://10.1.3.226:9808/healthz: dial tcp 10.1.3.226:9808: connect: connection refused
  Warning  BackOff    25s (x19 over 4m48s)   kubelet, ip-10-1-3-226.eu-west-1.compute.internal  Back-off restarting failed container

I've disabled liveness probe to get into container to troubleshoot. Here is what i found:
For some unknown reason /bin/aws-ebs-csi-driver is trying to use socket /tmp/csi.sock but not the /csi/csi.sock
I've noticed that argument node has impact on this
with node:

bash-4.2# aws-ebs-csi-driver node --endpoint unix:/csi/csi.sock
I0504 17:56:17.806082     152 driver.go:43] Driver: ebs.csi.aws.com Version: 0.4.0
I0504 17:56:17.818019     152 driver.go:79] Listening for connections on address: &net.UnixAddr{Name:"/tmp/csi.sock", Net:"unix"}
---
bash-4.2# curl localhost:9808/healtz 
curl: (7) Failed to connect to localhost port 9808: Connection refused

without node:

bash-4.2# aws-ebs-csi-driver --endpoint unix:/csi/csi.sock
I0504 17:56:34.741502     159 driver.go:43] Driver: ebs.csi.aws.com Version: 0.4.0
I0504 17:56:34.746955     159 driver.go:79] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
---
bash-4.2# curl http://localhost:9808/healthz
ok

I don't know what this argument brings and how to fix this behaviour properly

What you expected to happen?

Just apply kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=master" and get everything working

How to reproduce it (as minimally and precisely as possible)?
Described above

Anything else we need to know?:
Nevertheless container seem to start with proper arguments

bash-4.2# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 18:03 ?        00:00:00 /bin/aws-ebs-csi-driver node --endpoint=unix:/csi/csi.sock --logtostderr --v=5
root        12     0  0 18:19 pts/0    00:00:00 bash
root        29    12  0 18:22 pts/0    00:00:00 ps -ef
bash-4.2# ls -la /csi/
total 0
drwxr-xr-x 2 root root  6 May  4 18:03 .
drwxr-xr-x 1 root root 72 May  4 18:03 ..
bash-4.2# ls -la /tmp/
total 0
drwxrwxrwt 1 root root 22 May  4 18:22 .
drwxr-xr-x 1 root root 72 May  4 18:03 ..
drwxrwxrwt 2 root root  6 Aug 24  2019 .ICE-unix
drwxrwxrwt 2 root root  6 Aug 24  2019 .Test-unix
drwxrwxrwt 2 root root  6 Aug 24  2019 .X11-unix
drwxrwxrwt 2 root root  6 Aug 24  2019 .XIM-unix
drwxrwxrwt 2 root root  6 Aug 24  2019 .font-unix
srwxr-xr-x 1 root root  0 May  4 18:03 csi.sock

Environment

kubectl version
Client Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.10-eks-bac369", GitCommit:"bac3690554985327ae4d13e42169e8b1c2f37226", GitTreeState:"clean", BuildDate:"2020-02-21T23:37:18Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-eks-af3caf", GitCommit:"af3caf6136cd355f467083651cc1010a499f59b1", GitTreeState:"clean", BuildDate:"2020-03-27T21:51:36Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
  • Driver version:
    0.4.0
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 4, 2020
@wongma7
Copy link
Contributor

wongma7 commented May 4, 2020

I am pretty sure this is because this PR was just merged https://github.com/kubernetes-sigs/aws-ebs-csi-driver/pull/475/files#diff-0376647074b271462315f3158fefb07dR44 . It adds the node argument to entrypoint aws-ebs-csi-driver, but the 0.4.0 version of aws-ebs-csi-driver doesn't recognize this argument.

Quick fix should be to remove the node argument and add it back after 0.5.0 is released. Then the template will work whether stable/0.5.0 or alpha/latest is chosen.

@wongma7
Copy link
Contributor

wongma7 commented May 4, 2020

Actually 0.5.0 was already released and pushed so we should bump the stable overlay.

@nomatterz
Copy link
Author

@wongma7
thank you for prompt reply!
I confirm that issue is gone with current state of master on both EKS 1.15 and 1.16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants