Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e-image-puller doesn't work anymore. #63355

Closed
Random-Liu opened this issue May 2, 2018 · 5 comments · Fixed by #67950
Closed

e2e-image-puller doesn't work anymore. #63355

Random-Liu opened this issue May 2, 2018 · 5 comments · Fixed by #67950
Labels
area/test kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@Random-Liu
Copy link
Member

Random-Liu commented May 2, 2018

The docker binary we use in COS and most images is dynamically linked.

$ file docker 
docker: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=45a7a62f39ddabd51f5eaad2f561df08fe048371, with debug_info, not stripped

It won't work if we directly mount it into busybox, which is what we are doing in e2e-image-puller https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/manifests/e2e-image-puller.manifest#L105.

Here is the e2e image puller log:

/bin/sh: docker: not found
/bin/sh: docker: not found
/bin/sh: docker: not found
06:33:34 pulling k8s.gcr.io/alpine-with-bash:1.0
06:33:34 pulling k8s.gcr.io/apparmor-loader:0.1
06:33:34 pulling k8s.gcr.io/busybox:1.24
06:33:34 pulling k8s.gcr.io/dnsutils:e2e
/bin/sh: docker: not found
/bin/sh: docker: not found
06:33:34 pulling k8s.gcr.io/e2e-net-amd64:1.0
06:33:34 pulling k8s.gcr.io/echoserver:1.6
/bin/sh: docker: not found
/bin/sh: docker: not found
06:33:34 pulling k8s.gcr.io/eptest:0.1
06:33:34 pulling k8s.gcr.io/fakegitserver:0.1
/bin/sh: docker: not found
06:33:34 pulling k8s.gcr.io/galera-install:0.1
06:33:34 pulling k8s.gcr.io/invalid-image:invalid-tag
/bin/sh: docker: not found
/bin/sh: docker: not found
06:33:34 pulling k8s.gcr.io/iperf:e2e
/bin/sh: docker: not found
/bin/sh: docker: not found
/bin/sh: docker: not found
/bin/sh: docker: not found
/bin/sh: docker: not found
06:33:34 pulling k8s.gcr.io/jessie-dnsutils:e2e
06:33:34 pulling k8s.gcr.io/k8s-dns-dnsmasq-amd64:1.14.5
06:33:34 pulling k8s.gcr.io/liveness:e2e
06:33:34 pulling k8s.gcr.io/logs-generator:v0.1.0
06:33:34 pulling k8s.gcr.io/mounttest:0.8
/bin/sh: docker: not found
/bin/sh: docker: not found
/bin/sh: docker: not found
/bin/sh: docker: not found
/bin/sh: docker: not found
/bin/sh: docker: not found
/bin/sh: docker: not found
...

I believe this has been broken for a long time. Given that the e2e test works fine till now without it, should we just remove it?

@yujuhong @kubernetes/sig-node-bugs @kubernetes/sig-testing-bugs

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/bug Categorizes issue or PR as related to a bug. labels May 2, 2018
@Random-Liu
Copy link
Member Author

Random-Liu commented May 2, 2018

Another option is to make crictl statically linked, and use crictl instead.

k8s-github-robot pushed a commit that referenced this issue May 16, 2018
Automatic merge from submit-queue (batch tested with PRs 63167, 63357). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Install and use crictl in gce kube-up.sh

Download and use crictl in gce kube-up.sh.

This PR:
1. Downloads crictl `v1.0.0-beta.0` onto the node, which supports CRI v1alpha2. We'll upgrade it to `v1.0.0-beta.1` soon after the release is cut.
2. Change `kube-docker-monitor` to `kube-container-runtime-monitor`, and let it use `crictl` to do health monitoring.
3. Change `e2e-image-puller` to use `crictl`. Because of #63355, it doesn't work now. But in `crictl v1.0.0-beta.1`, we are going to statically link it, and the `e2e-image-puller` should work again.
4. Use `systemctl kill --kill-who=main` instead of `pkill`, the reason is that:
  a. `pkill docker` will send `SIGTERM` to all processes including `dockerd`, `docker-containerd`, `docker-containerd-shim`. This is not a problem for Docker 17.03 CE, because `containerd-shim` in containerd 0.2.x doesn't exit with SIGERM (see [code](https://github.com/containerd/containerd/blob/v0.2.x/containerd-shim/main.go#L123)). However, `containerd-shim` in containerd 1.0+ does exit with SIGTERM (see [code](https://github.com/containerd/containerd/blob/master/cmd/containerd-shim/main_unix.go#L200)). This means that `pkill docker` and `pkill containerd` will kill all shim processes for Docker 17.11+ and containerd 1.0+.
  b. We can use `pkill -x` instead. However, docker systemd service name is `docker`, but daemon process name is `dockerd`. We have to introduce another environment variable to specify "daemon process name". Given so, it seems easier to just use `systemctl kill` which only requires systemd service name. `systemctl kill --kill-who=main` will make sure only main process receives SIGTERM.

Signed-off-by: Lantao Liu <[email protected]>

/cc @filbranden @yujuhong @feiskyer @mrunalp @kubernetes/sig-node-pr-reviews @kubernetes/sig-cluster-lifecycle-pr-reviews 

**Release note**:

```release-note
Kubernetes cluster on GCE have crictl installed now. Users can use it to help debug their node. The documentation of crictl can be found https://github.com/kubernetes-incubator/cri-tools/blob/master/docs/crictl.md.
```
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 31, 2018
@BenTheElder
Copy link
Member

Any update here?

@nikhita
Copy link
Member

nikhita commented Aug 10, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 10, 2018
@yujuhong
Copy link
Contributor

Any update here?

Nope. We can just crictl to pre-pull the images as @Random-Liu said.
We added the image prepulling as a workaround due to the overwhelming amount of flake caused by pulling during the tests. Pulling seems to be more reliable than before.

Will bump the priority if this becomes an issue again. In the meantime, if anyone wants to help with the change, feel free to go ahead.

k8s-github-robot pushed a commit that referenced this issue Sep 7, 2018
Automatic merge from submit-queue (batch tested with PRs 67950, 68195). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Remove e2e-image-puller

**What this PR does / why we need it**:

A long time ago, We added the image prepulling as a workaround due to
the overwhelming amount of flake caused by pulling during the tests.
This functionality has been broken for a while now when we switched to a
COS image where mounting `docker` binary into `busybox` stopped working.
So we just have dead code we should clean up.

Change-Id: I538171a5c1d9361eee7f9e0a99655b88b1721e3e

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #63355

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants