Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not start container: com.github.dockerjava.api.exception.InternalServerErrorException: Status 500: {"cause":"OCI runtime attempted to invoke a command that was not found"} #352

Closed
cmoulliard opened this issue Jan 30, 2025 · 7 comments

Comments

@cmoulliard
Copy link
Contributor

cmoulliard commented Jan 30, 2025

Issue

The kind container fails to start using podman rootless - version: 5.3.1

Steps to reproduce

  • Start a fedora VM and ssh
  • Git clone this java test project: git clone https://github.com/ch007m/quarkus-kind-testcontainer.git
  • Set the following env vars for podman and expose the sock
podman system service --time=0 unix:///tmp/podman.sock &
export DOCKER_HOST=unix:///tmp/podman.sock
export TESTCONTAINERS_DOCKER_SOCKET_OVERRIDE=unix:///tmp/podman.sock
export TESTCONTAINERS_RYUK_DISABLED=true
mvn test

2025-01-30 10:14:55,174 INFO  [io.quarkus] (main) Installed features: [cdi, rest, smallrye-context-propagation, vertx]
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.599 s -- in org.acme.GreetingResourceTest
[INFO] Running org.acme.KindClusterTest
2025-01-30 10:14:55,848 INFO  [org.tes.ima.PullPolicy] (main) Image pull policy will be performed by: DefaultPullPolicy()
2025-01-30 10:14:55,851 INFO  [org.tes.uti.ImageNameSubstitutor] (main) Image name substitution will be performed by: DefaultImageNameSubstitutor (composite of 'ConfigurationFileImageNameSubstitutor' and 'PrefixingImageNameSubstitutor')
2025-01-30 10:14:56,218 INFO  [org.tes.DockerClientFactory] (main) Testcontainers version: 1.20.4
2025-01-30 10:14:56,445 INFO  [org.tes.doc.DockerClientProviderStrategy] (main) Found Docker environment with Environment variables, system properties and defaults. Resolved dockerHost=unix:///tmp/podman.sock
2025-01-30 10:14:56,445 INFO  [org.tes.DockerClientFactory] (main) Docker host IP address is localhost
2025-01-30 10:14:56,476 INFO  [org.tes.DockerClientFactory] (main) Connected to docker: 
  Server Version: 5.3.1
  API Version: 1.41
  Operating System: fedora
  Total Memory: 3894 MB
2025-01-30 10:14:56,478 WARN  [org.tes.uti.ResourceReaper] (main) 
********************************************************************************
Ryuk has been disabled. This can cause unexpected behavior in your environment.
********************************************************************************
2025-01-30 10:14:56,479 INFO  [org.tes.DockerClientFactory] (main) Checking the system...
2025-01-30 10:14:56,480 INFO  [org.tes.DockerClientFactory] (main) ✔︎ Docker server version should be at least 1.6.0
2025-01-30 10:14:56,514 INFO  [tc.kin.31.0] (main) Creating container for image: kindest/node:v1.31.0
2025-01-30 10:14:56,516 INFO  [org.tes.uti.RegistryAuthLocator] (main) Failure when attempting to lookup auth config. Please ignore if you don't have images in an authenticated registry. Details: (dockerImageName: kindest/node:v1.31.0, configFile: /home/user1/.docker/config.json, configEnv: DOCKER_AUTH_CONFIG). Falling back to docker-java default behaviour. Exception message: Status 404: No config supplied. Checked in order: /home/user1/.docker/config.json (file not found), DOCKER_AUTH_CONFIG (not set)
2025-01-30 10:14:56,545 INFO  [tc.kin.31.0] (main) Container kindest/node:v1.31.0 is starting: a29a01b12f93ecd0e61b0dd5bdbd46b00a9a20c18f09426f154040118e5cdcfa
2025-01-30 10:14:56,698 ERROR [tc.kin.31.0] (main) Could not start container: com.github.dockerjava.api.exception.InternalServerErrorException: Status 500: {"cause":"OCI runtime attempted to invoke a command that was not found","message":"crun: cannot stat `kindcontainer-2e8050fe-2dba-433c-b4f3-8e12f363712b/_data`: No such file or directory: OCI runtime attempted to invoke a command that was not found","response":500}

        at org.testcontainers.shaded.com.github.dockerjava.core.DefaultInvocationBuilder.execute(DefaultInvocationBuilder.java:247)
        at org.testcontainers.shaded.com.github.dockerjava.core.DefaultInvocationBuilder.post(DefaultInvocationBuilder.java:102)
        at org.testcontainers.shaded.com.github.dockerjava.core.exec.StartContainerCmdExec.execute(StartContainerCmdExec.java:31)
        at org.testcontainers.shaded.com.github.dockerjava.core.exec.StartContainerCmdExec.execute(StartContainerCmdExec.java:13)
        at org.testcontainers.shaded.com.github.dockerjava.core.exec.AbstrSyncDockerCmdExec.exec(AbstrSyncDockerCmdExec.java:21)
        at org.testcontainers.shaded.com.github.dockerjava.core.command.AbstrDockerCmd.exec(AbstrDockerCmd.java:33)
        at org.testcontainers.shaded.com.github.dockerjava.core.command.StartContainerCmdImpl.exec(StartContainerCmdImpl.java:42)
        at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:444)
        at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:346)
        at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
        at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:336)
        at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:322)
        at com.dajudge.kindcontainer.KindContainer.start(KindContainer.java:361)

I can create a kind cluster on the same VM using podman

[user1@cloud-vm quarkus-kind-testcontainer]$ export KIND_EXPERIMENTAL_PROVIDER=podman
[user1@cloud-vm quarkus-kind-testcontainer]$ kind create cluster
using podman due to KIND_EXPERIMENTAL_PROVIDER
enabling experimental podman provider
Creating cluster "kind" ...
⢎⡰ Ensuring node image (kindest/node:v1.32.0) 🖼
 ✓ Ensuring node image (kindest/node:v1.32.0) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
⢆⡱ Starting control-plane 🕹️
@dajudge
Copy link
Owner

dajudge commented Jan 30, 2025

Hi @@cmoulliard

Thank you for your input!

I'm not really certain on when/if I'll be able to investigate this issue since I'm not running podman.

Maybe you or somebody from the community running podman has time to investigate?

@cmoulliard
Copy link
Contributor Author

As the command to create a kind cluster works using podman rootless on mac where a Fedora CoreOS VM is created using AppleHV how could we compare the command executed to create a container using TestContainer vs what kind create cluster do ?

@cmoulliard
Copy link
Contributor Author

I created a gist containing the information about a kind container created successfully using the command kind create cluster --name zzz (see Kind container) and the one which is failing (see Testcontainer - kind)

https://gist.github.com/cmoulliard/37e881b5b30177b90eb1e745d9097963#kind-container

@cmoulliard

This comment has been minimized.

@cmoulliard
Copy link
Contributor Author

cmoulliard commented Jan 30, 2025

Can we set the flag KubeletInUserNamespace as documented here https://kind.sigs.k8s.io/docs/user/known-issues/#chrome-os ? @dajudge

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
  KubeletInUserNamespace: true

as I found this error within the journal

// https://gist.github.com/cmoulliard/a375b7ab57fb53ca05d6f009835a054b#file-gistfile1-txt-L133-L141
...
Jan 30 13:22:04 3ac6e579b21a kubelet[612]: I0130 13:22:04.833364     612 topology_manager.go:138] "Creating topology manager with none policy"
Jan 30 13:22:04 3ac6e579b21a kubelet[612]: I0130 13:22:04.833369     612 container_manager_linux.go:300] "Creating device plugin manager"
Jan 30 13:22:04 3ac6e579b21a kubelet[612]: I0130 13:22:04.833383     612 state_mem.go:36] "Initialized new in-memory state store"
Jan 30 13:22:05 3ac6e579b21a kubelet[612]: I0130 13:22:05.034717     612 server.go:873] "Failed to ApplyOOMScoreAdj" err="write /proc/self/oom_score_adj: permission denied"
Jan 30 13:22:05 3ac6e579b21a kubelet[612]: I0130 13:22:05.034866     612 kubelet.go:408] "Attempting to sync node with API server"
Jan 30 13:22:05 3ac6e579b21a kubelet[612]: I0130 13:22:05.034889     612 kubelet.go:303] "Adding static pod path" path="/etc/kubernetes/manifests"
Jan 30 13:22:05 3ac6e579b21a kubelet[612]: I0130 13:22:05.034928     612 kubelet.go:314] "Adding apiserver pod source"
Jan 30 13:22:05 3ac6e579b21a kubelet[612]: I0130 13:22:05.034944     612 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
Jan 30 13:22:05 3ac6e579b21a kubelet[612]: E0130 13:22:05.035046     612 kubelet.go:498] "Failed to create an oomWatcher (running in UserNS, Hint: enable KubeletInUserNamespace feature flag to ignore the error)" err="open /dev/kmsg: operation not permitted"
Jan 30 13:22:05 3ac6e579b21a kubelet[612]: E0130 13:22:05.035094     612 run.go:72] "command failed" err="failed to run Kubelet: failed to create kubelet: open /dev/kmsg: operation not permitted"
Jan 30 13:22:05 3ac6e579b21a systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE

The reason why that works with kind create cluster is also because they set such a gate feature here within the code where they configure kubeadmin: https://github.com/kubernetes-sigs/kind/blob/f96632a3c84686f5cf322e1d5f0201b02ca324e9/pkg/cluster/internal/kubeadm/config.go#L502

and they create such YAML files

# config generated by kind
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
metadata:
  name: config
kubernetesVersion: v1.32.1
clusterName: "toto"

controlPlaneEndpoint: "toto-control-plane:6443"
# on docker for mac we have to expose the api server via port forward,
# so we need to ensure the cert is valid for localhost so we can talk
# to the cluster after rewriting the kubeconfig to point to localhost
apiServer:
  certSANs: [localhost, "127.0.0.1"]
  extraArgs:
    "runtime-config": ""
    "feature-gates": "KubeletInUserNamespace=true"
controllerManager:
  extraArgs:
    "feature-gates": "KubeletInUserNamespace=true"
    enable-hostpath-provisioner: "true"
    # configure ipv6 default addresses for IPv6 clusters
    
scheduler:
  extraArgs:
    "feature-gates": "KubeletInUserNamespace=true"
    # configure ipv6 default addresses for IPv6 clusters
   
networking:
  podSubnet: "10.244.0.0/16"
  serviceSubnet: "10.96.0.0/16"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
metadata:
  name: config
# we use a well know token for TLS bootstrap
bootstrapTokens:
- token: "abcdef.0123456789abcdef"
# we use a well know port for making the API server discoverable inside docker network. 
# from the host machine such port will be accessible via a random local port instead.
localAPIEndpoint:
  advertiseAddress: "10.89.0.8"
  bindPort: 6443
nodeRegistration:
  criSocket: "unix:///run/containerd/containerd.sock"
  kubeletExtraArgs:
    node-ip: "10.89.0.8"
    provider-id: "kind://podman/toto/toto-control-plane"
    node-labels: ""
skipPhases:
  - "preflight"
---
# no-op entry that exists solely so it can be patched
apiVersion: kubeadm.k8s.io/v1beta3
kind: JoinConfiguration
metadata:
  name: config
controlPlane:
  localAPIEndpoint:
    advertiseAddress: "10.89.0.8"
    bindPort: 6443
nodeRegistration:
  criSocket: "unix:///run/containerd/containerd.sock"
  kubeletExtraArgs:
    node-ip: "10.89.0.8"
    provider-id: "kind://podman/toto/toto-control-plane"
    node-labels: ""
discovery:
  bootstrapToken:
    apiServerEndpoint: "toto-control-plane:6443"
    token: "abcdef.0123456789abcdef"
    unsafeSkipCAVerification: true
skipPhases:
  - "preflight"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
metadata:
  name: config
cgroupDriver: systemd
cgroupRoot: /kubelet
failSwapOn: false
# configure ipv6 addresses in IPv6 mode

# disable disk resource management by default
# kubelet will see the host disk that the inner container runtime
# is ultimately backed by and attempt to recover disk space. we don't want that.
imageGCHighThresholdPercent: 100
evictionHard:
  nodefs.available: "0%"
  nodefs.inodesFree: "0%"
  imagefs.available: "0%"
featureGates:
  "KubeletInUserNamespace": true
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
metadata:
  name: config
mode: "iptables"
featureGates:
  "KubeletInUserNamespace": true

iptables:
  minSyncPeriod: 1s
conntrack:
# Skip setting sysctl value "net.netfilter.nf_conntrack_max"
# It is a global variable that affects other namespaces
  maxPerCore: 0
# Set sysctl value "net.netfilter.nf_conntrack_tcp_be_liberal"
# for nftables proxy (theoretically for kernels older than 6.1)
# xref: https://github.com/kubernetes/kubernetes/issues/117924


# Skip setting "net.netfilter.nf_conntrack_tcp_timeout_established"
  tcpEstablishedTimeout: 0s
# Skip setting "net.netfilter.nf_conntrack_tcp_timeout_close"
  tcpCloseWaitTimeout: 0s

@cmoulliard
Copy link
Contributor Author

cmoulliard commented Jan 30, 2025

I resolved the issue. Here are the steps

@dajudge

  1. Code change
Patch the file: kindcontainer/src/main/resources/kubeadm-1.24.0.yaml to add the feature gate
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
  KubeletInUserNamespace: true <==== ADDED
cgroupDriver: systemd
cgroupRoot: /kubelet
evictionHard:
  imagefs.available: 0%
  nodefs.available: 0%
  nodefs.inodesFree: 0%
failSwapOn: false
imageGCHighThresholdPercent: 100

Recompile and publish
gradle build -x test
gradle publishToMavenLocal
  1. Re run the test
❯ podman info | grep "rootless:"
    rootless: true

Change the version to use the snapshot of kind-testcontainer

export DOCKER_HOST="unix:///var/run/docker.sock"
export TESTCONTAINERS_RYUK_DISABLED=true
mvn clean test
...
2025-01-30 15:47:31,620 INFO  [org.tes.doc.DockerClientProviderStrategy] (main) Found Docker environment with Environment variables, system properties and defaults. Resolved dockerHost=unix:///var/run/docker.sock
2025-01-30 15:47:31,620 INFO  [org.tes.DockerClientFactory] (main) Docker host IP address is localhost
2025-01-30 15:47:31,656 INFO  [org.tes.DockerClientFactory] (main) Connected to docker: 
  Server Version: 5.2.5
  API Version: 1.41
  Operating System: fedora
  Total Memory: 9219 MB
2025-01-30 15:47:31,657 WARN  [org.tes.uti.ResourceReaper] (main) 
********************************************************************************
Ryuk has been disabled. This can cause unexpected behavior in your environment.
********************************************************************************
2025-01-30 15:47:31,659 INFO  [org.tes.DockerClientFactory] (main) Checking the system...
2025-01-30 15:47:31,659 INFO  [org.tes.DockerClientFactory] (main) ✔︎ Docker server version should be at least 1.6.0
2025-01-30 15:47:31,782 INFO  [tc.kin.31.0] (main) Creating container for image: kindest/node:v1.31.0
2025-01-30 15:47:31,826 INFO  [tc.kin.31.0] (main) Container kindest/node:v1.31.0 is starting: f44af5968794593d1166ff8dc4e11701b502e92b0d92c0deefe9607e763d4364
2025-01-30 15:47:32,312 INFO  [com.daj.kin.KindContainer] (main) Container internal IP address: 10.88.0.55
2025-01-30 15:47:32,312 INFO  [com.daj.kin.KindContainer] (main) Container external IP address: localhost
2025-01-30 15:47:32,312 INFO  [com.daj.kin.KindContainer] (main) Executing command: mkdir -p /kindcontainer
2025-01-30 15:47:32,378 INFO  [com.daj.kin.TemplateHelpers] (main) Writing container file: /kindcontainer/kubeadm-1.24.0.yaml
2025-01-30 15:47:32,398 INFO  [com.daj.kin.KindContainer] (main) Executing command: kubeadm init --skip-phases=preflight --config=/kindcontainer/kubeadm-1.24.0.yaml --skip-token-print --node-name=kind --v=6
2025-01-30 15:47:59,052 INFO  [com.daj.kin.TemplateHelpers] (main) Writing container file: /kindcontainer/cni.yaml
2025-01-30 15:47:59,054 INFO  [com.daj.kin.KindContainer] (main) Executing command: kubectl apply -f /kindcontainer/cni.yaml
2025-01-30 15:47:59,221 INFO  [com.daj.kin.KindContainer] (main) Executing command: kubectl apply -f /kind/manifests/default-storage.yaml
2025-01-30 15:47:59,695 INFO  [com.daj.kin.KindContainer] (main) Executing command: kubectl taint node kind node-role.kubernetes.io/control-plane:NoSchedule-
2025-01-30 15:47:59,764 INFO  [com.daj.kin.KubernetesWithKubeletContainer] (main) Waiting for a node to become ready...
2025-01-30 15:48:23,368 INFO  [com.daj.kin.KubernetesWithKubeletContainer] (main) Node ready: kind
2025-01-30 15:48:23,371 INFO  [tc.kin.31.0] (main) Container kindest/node:v1.31.0 started in PT51.589586S
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.61 s -- in org.acme.KindClusterTest
2025-01-30 15:48:24,651 INFO  [io.quarkus] (main) code-with-quarkus stopped in 0.013s
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  58.889 s
[INFO] Finished at: 2025-01-30T15:48:24+01:00
[INFO] ------------------------------------------------------------------------

cmoulliard added a commit to ch007m/fork-kindcontainer that referenced this issue Feb 4, 2025
…rootless to work when we create a kind cluster. dajudge#352

Signed-off-by: cmoulliard <[email protected]>
dajudge pushed a commit that referenced this issue Feb 10, 2025
* Enable kubelet feature gates: KubeletInUserNamespace to allow podman rootless to work when we create a kind cluster. #352

Signed-off-by: cmoulliard <[email protected]>

* Move the kind declaration under the apiVersion for kubelet

Signed-off-by: cmoulliard <[email protected]>

---------

Signed-off-by: cmoulliard <[email protected]>
@cmoulliard
Copy link
Contributor Author

The error reported part of this issue is not related to the PR which has been merged and was related to the fact that a wrong version of crun was used by podman. That happens when you use the ubuntu runner on GitHub which proposes by default podman 4 and you installed also podman 5 with the help of homebrew. Remove podman's crun of podman 3 apt-get remove crun and the issue will be fixed.

The PR addresses the permission denied error reported by kubelet (running as root) in a rootless environment and fixed by the PR !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants