-
Notifications
You must be signed in to change notification settings - Fork 779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are GPU-enabled container runnable with containerd runtime? #1239
Comments
Modifying the relevant files (for sake of testing, switched containerd to 1.3.4) in /var/snap/microk8s/current/args/ (ctr, kubelet, containerd and containerd.toml) to use containerd from the host keeps the error appearing.
|
@Bamfax you can enable gpu in microk8s by simply executing |
@balchua sorry if I did not point that out at the beginning: I was not able to get gpu-enabled containers running at all so far (using containerd). Enabling the gpu addon seems to work fine, kmods are detected. Then trying to run the cuda-vector-add testcontainer it remains in pending state, with "insufficient gpu". Adding more detail at the the first post in this issue. |
Could you share the manifest of the We use the manifest in [1], is it possible the "limits" are not the ones we have. [1] https://github.com/ubuntu/microk8s/blob/feature/ha-dqlite/tests/templates/cuda-add.yaml |
Thanks. I added missing details and the manifast in the first post. The manifest should have been identical. I retried using [1] to be sure, gives the same result. |
I tested this when i was upgrading containerd to 1.3 and it was spinning up the pod in [1]. |
As far as i know (there's a good chance i am wrong), microk8s package the nvidia libs as shown in the snapcraft.yaml.
Could it be conflicting with the libs that are installed in the system? |
@balchua the library conflict was spot on, many thanks. I purged the four packages mentioned and reinstalled microk8s, enabling just the gpu addon. Checking /var/snap/microk8s/current/args/kubelet config, it is set so use remote/containerd sock. Happy to confirm that containerd is indeed working fine. Many thanks for the help. [...default install...]
Also when trying to use "microk8s ctr" again as above, it does not find containerd in the PATH. So somehow my native-OS containerd seems to have been picked up before. Afterwards to reproduce, I reinstalled each package one by one on the host and reran the cuda container after each install.
The cuda container starts working again as soon as nvidia-container-toolkit is deinstalled from base OS. Again, many thanks for the help! |
Does it mean you are not using the containerd that comes with microk8s? |
Microk8s in its default install now, all standard. Installed as described in the previous post, with the standard install command: My /var/snap/microk8s/current/args/kubelet then has this config, that is what I meant with "set so use remote/containerd sock"
|
GPU-enabled pods are failing to start using the gpu addon and trying to stay with containerd (not using docker as default runtime). The cuda-vector-add testing pod remains in pending state and does not start. Looking at the logs of nvidia-device-plugin-daemonset, it also has errors and mentions that the default runtime needs to be changed from containerd to docker.
I would prefer to stay with the containerd runtime and avoid using docker-ce as runtime (nvidia-docker2 depends upon docker-ce), due to other aspects (docker changing iptables rules, which requires a further solution kubernetes/kubernetes#39823 (comment) and #267)
Seeing reports the k3s is able to run gpu-enabled pods using containerd (https://dev.to/mweibel/add-nvidia-gpu-support-to-k3s-with-containerd-4j17) and that my OS-level containerd is able to run a pod with nvidia-smi I prefer to stay with containerd runtime. Is that somehow possible?
The system on which microk8s is being run on is a Debian Buster 10.4 with Nvidia drivers from Debian Backports and Nvidia docker libraries from nvidia.github.io. Microk8s was installed via snap.
Looking further where this may be rooted in, I used microk8s.ctr to try start the pod directly and compare with another ctr/containerd. "microk8s.ctr" using containerd runtime "nvidia-container-runtime" throws a libnvidia-container.so.1 error. In contrary, everything works fine doing the same using "ctr" directly (different ctr/containerd outside microk8s, docker 1.2.5 deb used here)
Is the above error related to the issue mentioned on https://github.com/NVIDIA/k8s-device-plugin#prerequisites, or would gpu-enabled pods be runnable in the given setup?
The pod starts fine using the non-microk8s ctr/containerd:
The text was updated successfully, but these errors were encountered: