Starting container directly with runsc: GPU access blocked by operating system #11069

sfc-gh-lshi · 2024-10-21T21:04:01Z

Description

I would like to use runsc to start a container with GPU access: sudo runsc --nvproxy --strace --debug --debug-log=/tmp/logs/runsc.log --network=host --host-uds=all run "container". This container appears to start, but when running nvidia-smi the following error is returned: Failed to initialize NVML: GPU access blocked by the operating system.

Note that GPU access works fine through Docker + gVisor:

$ sudo runsc install -- --nvproxy=true --nvproxy-docker=true
$ sudo systemctl restart docker
$ sudo docker run --rm --runtime=runsc --gpus=all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-0eb3d939-60c7-cee1-e4b2-f311928f0d73)
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-a052ca4c-16d7-c4a9-edc4-699fe60a0163)

How can I resolve this issue?

Bonus question

How can I make it work with --rootless? I'm happy to open up a different issue, but if there's an easy/quick pointer then we can tackle it here.

With --rootless, the command fails during the nvidia-container-cli configure run by gofer (same config.json as below).

$ unshare -Ur runsc --nvproxy --rootless --strace --debug --debug-log=/home/lshi/runsc.log --network=host --host-uds=all run "container"
running container: creating container: cannot create gofer process: nvproxy setup: nvidia-container-cli configure failed, err: exit status 1
stdout:
stderr: nvidia-container-cli: initialization error: privilege change failed: operation not permitted

This error seems to originate from libnvidia-container either here or here. I already tried adding all capabilities (to gofer) and making them inheritable right before the nvidia-container-cli configure call.

Steps to reproduce

Create a GCP compute instance with GPU.

gcloud compute instances create $vmName \
    --machine-type=a2-highgpu-2g \
    --zone=us-central1-f \
    --boot-disk-size=200GB \
    --image=common-cu122-v20240922-ubuntu-2204-py310 \
    --image-project=deeplearning-platform-release \
    --maintenance-policy=TERMINATE

ssh into the GCP instance and say yes to installing GPU drivers: gcloud compute ssh --zone=us-central1-f $vmName.
Run nvidia-smi as a sanity check - this should work.
Install gVisor 20240807.

(
  set -e
  ARCH=$(uname -m)
  URL=https://storage.googleapis.com/gvisor/releases/release/20240807/${ARCH}
  wget ${URL}/runsc ${URL}/runsc.sha512 \
    ${URL}/containerd-shim-runsc-v1 ${URL}/containerd-shim-runsc-v1.sha512
  sha512sum -c runsc.sha512 \
    -c containerd-shim-runsc-v1.sha512
  rm -f *.sha512
  chmod a+rx runsc containerd-shim-runsc-v1
  sudo mv runsc containerd-shim-runsc-v1 /usr/local/bin
)

Set up directories.

mkdir ~/container
mkdir ~/nvproxy && cd ~/nvproxy

Create config.json.

{
  "ociVersion": "1.0.0",
  "hooks": {
    "prestart": [
      {
        "path": "/usr/bin/nvidia-container-runtime-hook",
        "args": [
          "prestart"
        ],
        "env": [
          "NVIDIA_VISIBLE_DEVICES=all",
          "NVIDIA_DRIVER_CAPABILITIES=all"
        ]
      }
    ]
  },
  "process": {
    "user": {
      "uid": 0,
      "gid": 0
    },
    "args": [
      "nvidia-smi"
    ],
    "env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "TERM=xterm",
      "LD_LIBRARY_PATH=/usr/lib:/usr/lib64:/usr/local",
      "NVIDIA_VISIBLE_DEVICES=all"
    ],
    "cwd": "/",
    "noNewPrivileges": false,
    "capabilities": {
      "bounding": [
        "CAP_CHOWN",
        "CAP_DAC_OVERRIDE",
        "CAP_DAC_READ_SEARCH",
        "CAP_FOWNER",
        "CAP_FSETID",
        "CAP_KILL",
        "CAP_SETGID",
        "CAP_SETUID",
        "CAP_SETPCAP",
        "CAP_LINUX_IMMUTABLE",
        "CAP_NET_BIND_SERVICE",
        "CAP_NET_BROADCAST",
        "CAP_NET_ADMIN",
        "CAP_NET_RAW",
        "CAP_IPC_LOCK",
        "CAP_IPC_OWNER",
        "CAP_SYS_MODULE",
        "CAP_SYS_RAWIO",
        "CAP_SYS_CHROOT",
        "CAP_SYS_PTRACE",
        "CAP_SYS_PACCT",
        "CAP_SYS_ADMIN",
        "CAP_SYS_BOOT",
        "CAP_SYS_NICE",
        "CAP_SYS_RESOURCE",
        "CAP_SYS_TIME",
        "CAP_SYS_TTY_CONFIG",
        "CAP_MKNOD",
        "CAP_LEASE",
        "CAP_AUDIT_WRITE",
        "CAP_AUDIT_CONTROL",
        "CAP_SETFCAP",
        "CAP_MAC_OVERRIDE",
        "CAP_MAC_ADMIN",
        "CAP_SYSLOG",
        "CAP_WAKE_ALARM",
        "CAP_BLOCK_SUSPEND",
        "CAP_AUDIT_READ",
        "CAP_PERFMON",
        "CAP_BPF",
        "CAP_CHECKPOINT_RESTORE"
      ],
      "effective": [
        "CAP_CHOWN",
        "CAP_DAC_OVERRIDE",
        "CAP_DAC_READ_SEARCH",
        "CAP_FOWNER",
        "CAP_FSETID",
        "CAP_KILL",
        "CAP_SETGID",
        "CAP_SETUID",
        "CAP_SETPCAP",
        "CAP_LINUX_IMMUTABLE",
        "CAP_NET_BIND_SERVICE",
        "CAP_NET_BROADCAST",
        "CAP_NET_ADMIN",
        "CAP_NET_RAW",
        "CAP_IPC_LOCK",
        "CAP_IPC_OWNER",
        "CAP_SYS_MODULE",
        "CAP_SYS_RAWIO",
        "CAP_SYS_CHROOT",
        "CAP_SYS_PTRACE",
        "CAP_SYS_PACCT",
        "CAP_SYS_ADMIN",
        "CAP_SYS_BOOT",
        "CAP_SYS_NICE",
        "CAP_SYS_RESOURCE",
        "CAP_SYS_TIME",
        "CAP_SYS_TTY_CONFIG",
        "CAP_MKNOD",
        "CAP_LEASE",
        "CAP_AUDIT_WRITE",
        "CAP_AUDIT_CONTROL",
        "CAP_SETFCAP",
        "CAP_MAC_OVERRIDE",
        "CAP_MAC_ADMIN",
        "CAP_SYSLOG",
        "CAP_WAKE_ALARM",
        "CAP_BLOCK_SUSPEND",
        "CAP_AUDIT_READ",
        "CAP_PERFMON",
        "CAP_BPF",
        "CAP_CHECKPOINT_RESTORE"
      ],
      "inheritable": [
        "CAP_CHOWN",
        "CAP_DAC_OVERRIDE",
        "CAP_DAC_READ_SEARCH",
        "CAP_FOWNER",
        "CAP_FSETID",
        "CAP_KILL",
        "CAP_SETGID",
        "CAP_SETUID",
        "CAP_SETPCAP",
        "CAP_LINUX_IMMUTABLE",
        "CAP_NET_BIND_SERVICE",
        "CAP_NET_BROADCAST",
        "CAP_NET_ADMIN",
        "CAP_NET_RAW",
        "CAP_IPC_LOCK",
        "CAP_IPC_OWNER",
        "CAP_SYS_MODULE",
        "CAP_SYS_RAWIO",
        "CAP_SYS_CHROOT",
        "CAP_SYS_PTRACE",
        "CAP_SYS_PACCT",
        "CAP_SYS_ADMIN",
        "CAP_SYS_BOOT",
        "CAP_SYS_NICE",
        "CAP_SYS_RESOURCE",
        "CAP_SYS_TIME",
        "CAP_SYS_TTY_CONFIG",
        "CAP_MKNOD",
        "CAP_LEASE",
        "CAP_AUDIT_WRITE",
        "CAP_AUDIT_CONTROL",
        "CAP_SETFCAP",
        "CAP_MAC_OVERRIDE",
        "CAP_MAC_ADMIN",
        "CAP_SYSLOG",
        "CAP_WAKE_ALARM",
        "CAP_BLOCK_SUSPEND",
        "CAP_AUDIT_READ",
        "CAP_PERFMON",
        "CAP_BPF",
        "CAP_CHECKPOINT_RESTORE"
      ],
      "permitted": [
        "CAP_CHOWN",
        "CAP_DAC_OVERRIDE",
        "CAP_DAC_READ_SEARCH",
        "CAP_FOWNER",
        "CAP_FSETID",
        "CAP_KILL",
        "CAP_SETGID",
        "CAP_SETUID",
        "CAP_SETPCAP",
        "CAP_LINUX_IMMUTABLE",
        "CAP_NET_BIND_SERVICE",
        "CAP_NET_BROADCAST",
        "CAP_NET_ADMIN",
        "CAP_NET_RAW",
        "CAP_IPC_LOCK",
        "CAP_IPC_OWNER",
        "CAP_SYS_MODULE",
        "CAP_SYS_RAWIO",
        "CAP_SYS_CHROOT",
        "CAP_SYS_PTRACE",
        "CAP_SYS_PACCT",
        "CAP_SYS_ADMIN",
        "CAP_SYS_BOOT",
        "CAP_SYS_NICE",
        "CAP_SYS_RESOURCE",
        "CAP_SYS_TIME",
        "CAP_SYS_TTY_CONFIG",
        "CAP_MKNOD",
        "CAP_LEASE",
        "CAP_AUDIT_WRITE",
        "CAP_AUDIT_CONTROL",
        "CAP_SETFCAP",
        "CAP_MAC_OVERRIDE",
        "CAP_MAC_ADMIN",
        "CAP_SYSLOG",
        "CAP_WAKE_ALARM",
        "CAP_BLOCK_SUSPEND",
        "CAP_AUDIT_READ",
        "CAP_PERFMON",
        "CAP_BPF",
        "CAP_CHECKPOINT_RESTORE"
      ]
    }
  },
  "root": {
    "path": "/home/lshi/container"
  },
  "hostname": "runsc",
  "mounts": [
    {
      "destination": "/usr/bin",
      "type": "tmpfs",
      "source": "/usr/bin",
      "options": [
        "rprivate",
        "rbind"
      ]
    },
    {
      "destination": "/usr/local",
      "type": "tmpfs",
      "source": "/usr/local",
      "options": [
        "rprivate",
        "rbind"
      ]
    },
    {
      "destination": "/lib64/ld-linux-x86-64.so.2",
      "type": "tmpfs",
      "source": "/usr/lib64/ld-linux-x86-64.so.2",
      "options": [
        "rprivate",
        "rbind"
      ]
    },
    {
      "destination": "/usr/lib64",
      "type": "tmpfs",
      "source": "/usr/lib64",
      "options": [
        "rprivate",
        "rbind"
      ]
    },
    {
      "destination": "/usr/lib",
      "type": "tmpfs",
      "source": "/usr/lib",
      "options": [
        "rprivate",
        "rbind"
      ]
    },
    {
      "destination": "/sbin",
      "type": "bind",
      "source": "/sbin",
      "options": [
        "ro",
        "rprivate",
        "rbind"
      ]
    },
    {
      "destination": "/proc",
      "type": "proc",
      "source": "proc"
    },
    {
      "destination": "/tmp",
      "type": "tmpfs",
      "source": "tmpfs",
      "options": [
        "rw",
        "noexec"
      ]
    },
    {
      "destination": "/etc",
      "type": "tmpfs",
      "source": "/etc",
      "options": [
        "rw",
        "noexec"
      ]
    },
    {
      "destination": "/sys",
      "type": "sysfs",
      "source": "sysfs",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "ro"
      ]
    },
    {
      "destination": "/proc",
      "type": "proc",
      "source": "proc"
    },
    {
      "destination": "/sys",
      "type": "sysfs",
      "source": "sysfs",
      "options": [
        "noexec",
        "ro"
      ]
    }
  ],
  "linux": {
    "namespaces": [
      {
        "type": "cgroup"
      },
      {
        "type": "pid"
      },
      {
        "type": "uts"
      },
      {
        "type": "ipc"
      },
      {
        "type": "mnt"
      }
    ]
  }
}

For now, this container should have every capability provided to it, though eventually I'll try to whittle it down.
The root.path needs to be updated with your home directory.
lib mounts are present to provide libraries required by nvidia-smi, and these paths are added to LD_LIBRARY_PATH.

Start the container.

$ sudo runsc --nvproxy --strace --debug --debug-log=/tmp/logs/runsc.log --network=host --host-uds=all run "container"
Failed to initialize NVML: GPU access blocked by the operating system

runsc version

runsc version release-20240807.0
spec: 1.1.0-rc.1

docker version (if using docker)

No response

uname

Linux lshi-gvisor-gpu 6.5.0-1025-gcp #27~22.04.1-Ubuntu SMP Tue Jul 16 23:03:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

I1021 20:39:11.873793 33296 strace.go:564] [ 1: 1] nvidia-smi E stat(0x7ef93efab480 /usr/bin/nvidia-modprobe, 0x7ebbe460cc00)
I1021 20:39:11.873816 33296 strace.go:602] [ 1: 1] nvidia-smi X stat(0x7ef93efab480 /usr/bin/nvidia-modprobe, 0x7ebbe460cc00 {dev=29, ino=3, mode=S_IFREG|S_ISUID|0o755, nlink=1, uid=0, gid=0, rdev=0, size=43344, blksize=4096, blocks=85, atime=2024-10-21 19:58:30.573566562 +0000 UTC, mtime=2024-10-21 18:53:12.461679304 +0000 UTC, ctime=2024-10-21 18:53:12.461679304 +0000 UTC}) = 0 (0x0) (10.499µs)
I1021 20:39:11.873833 33296 strace.go:559] [ 1: 1] nvidia-smi E geteuid()
I1021 20:39:11.873839 33296 strace.go:596] [ 1: 1] nvidia-smi X geteuid() = 0 (0x0) (341ns)
I1021 20:39:11.873848 33296 strace.go:570] [ 1: 1] nvidia-smi E openat(AT_FDCWD /, 0x7ef93efab32b /proc/devices, O_RDONLY|0x0, 0o0)
I1021 20:39:11.873859 33296 strace.go:608] [ 1: 1] nvidia-smi X openat(AT_FDCWD /, 0x7ef93efab32b /proc/devices, O_RDONLY|0x0, 0o0) = 0 (0x0) errno=2 (no such file or directory) (2.456µs)
I1021 20:39:11.873868 33296 strace.go:570] [ 1: 1] nvidia-smi E openat(AT_FDCWD /, 0x7ebbe460ccf0 /proc/driver/nvidia/capabilities/mig/config, O_RDONLY|0x0, 0o0)
I1021 20:39:11.873876 33296 strace.go:608] [ 1: 1] nvidia-smi X openat(AT_FDCWD /, 0x7ebbe460ccf0 /proc/driver/nvidia/capabilities/mig/config, O_RDONLY|0x0, 0o0) = 0 (0x0) errno=2 (no such file or directory) (1.745µs)
I1021 20:39:11.873885 33296 strace.go:564] [ 1: 1] nvidia-smi E stat(0x7ebbe460cc10 , 0x7ebbe460cb20)
I1021 20:39:11.873910 33296 strace.go:602] [ 1: 1] nvidia-smi X stat(0x7ebbe460cc10 , 0x7ebbe460cb20) = 0 (0x0) errno=2 (no such file or directory) (16.376µs)
I1021 20:39:11.873927 33296 strace.go:570] [ 1: 1] nvidia-smi E openat(AT_FDCWD /, 0x7ef93efab32b /proc/devices, O_RDONLY|0x0, 0o0)
I1021 20:39:11.873937 33296 strace.go:608] [ 1: 1] nvidia-smi X openat(AT_FDCWD /, 0x7ef93efab32b /proc/devices, O_RDONLY|0x0, 0o0) = 0 (0x0) errno=2 (no such file or directory) (3.263µs)
I1021 20:39:11.873946 33296 strace.go:564] [ 1: 1] nvidia-smi E stat(0x7ef93efab480 /usr/bin/nvidia-modprobe, 0x7ebbe460cc00)
I1021 20:39:11.873975 33296 strace.go:602] [ 1: 1] nvidia-smi X stat(0x7ef93efab480 /usr/bin/nvidia-modprobe, 0x7ebbe460cc00 {dev=29, ino=3, mode=S_IFREG|S_ISUID|0o755, nlink=1, uid=0, gid=0, rdev=0, size=43344, blksize=4096, blocks=85, atime=2024-10-21 19:58:30.573566562 +0000 UTC, mtime=2024-10-21 18:53:12.461679304 +0000 UTC, ctime=2024-10-21 18:53:12.461679304 +0000 UTC}) = 0 (0x0) (10.458µs)
I1021 20:39:11.873990 33296 strace.go:559] [ 1: 1] nvidia-smi E geteuid()
I1021 20:39:11.873996 33296 strace.go:596] [ 1: 1] nvidia-smi X geteuid() = 0 (0x0) (248ns)
I1021 20:39:11.874005 33296 strace.go:570] [ 1: 1] nvidia-smi E openat(AT_FDCWD /, 0x7ef93efab32b /proc/devices, O_RDONLY|0x0, 0o0)
I1021 20:39:11.874016 33296 strace.go:608] [ 1: 1] nvidia-smi X openat(AT_FDCWD /, 0x7ef93efab32b /proc/devices, O_RDONLY|0x0, 0o0) = 0 (0x0) errno=2 (no such file or directory) (2.198µs)
I1021 20:39:11.874025 33296 strace.go:570] [ 1: 1] nvidia-smi E openat(AT_FDCWD /, 0x7ebbe460ccf0 /proc/driver/nvidia/capabilities/mig/monitor, O_RDONLY|0x0, 0o0)
I1021 20:39:11.874033 33296 strace.go:608] [ 1: 1] nvidia-smi X openat(AT_FDCWD /, 0x7ebbe460ccf0 /proc/driver/nvidia/capabilities/mig/monitor, O_RDONLY|0x0, 0o0) = 0 (0x0) errno=2 (no such file or directory) (1.696µs)
I1021 20:39:11.874042 33296 strace.go:564] [ 1: 1] nvidia-smi E stat(0x7ebbe460cc10 , 0x7ebbe460cb20)
I1021 20:39:11.874049 33296 strace.go:602] [ 1: 1] nvidia-smi X stat(0x7ebbe460cc10 , 0x7ebbe460cb20) = 0 (0x0) errno=2 (no such file or directory) (640ns)
I1021 20:39:11.874069 33296 strace.go:570] [ 1: 1] nvidia-smi E newfstatat(0x1 host:[2], 0x7ef9403d844f , 0x7ebbe460d260, 0x1000)
I1021 20:39:11.874086 33296 strace.go:608] [ 1: 1] nvidia-smi X newfstatat(0x1 host:[2], 0x7ef9403d844f , 0x7ebbe460d260 {dev=8, ino=2, mode=S_IFCHR|0o620, nlink=1, uid=0, gid=0, rdev=0, size=0, blksize=1024, blocks=0, atime=2024-10-21 20:39:07.411607567 +0000 UTC, mtime=2024-10-21 20:39:07.411607567 +0000 UTC, ctime=2024-10-21 20:39:11.659911063 +0000 UTC}, 0x1000) = 0 (0x0) (5.233µs)
D1021 20:39:11.874098 33296 usertrap_amd64.go:210] [ 1: 1] Found the pattern at ip 7ef940319f43:sysno 16
D1021 20:39:11.874105 33296 usertrap_amd64.go:122] [ 1: 1] Allocate a new trap: 0xc000140090 26
D1021 20:39:11.874112 33296 usertrap_amd64.go:223] [ 1: 1] Apply the binary patch addr 7ef940319f43 trap addr 62820 ([184 16 0 0 0 15 5] -> [255 36 37 32 40 6 0])
I1021 20:39:11.874122 33296 strace.go:567] [ 1: 1] nvidia-smi E ioctl(0x1 host:[2], 0x5401, 0x7ebbe460d1c0)
I1021 20:39:11.874130 33296 strace.go:605] [ 1: 1] nvidia-smi X ioctl(0x1 host:[2], 0x5401, 0x7ebbe460d1c0) = 0 (0x0) errno=25 (not a typewriter) (1.128µs)
D1021 20:39:11.874178 33296 usertrap_amd64.go:210] [ 1: 1] Found the pattern at ip 7ef940314880:sysno 1
D1021 20:39:11.874185 33296 usertrap_amd64.go:122] [ 1: 1] Allocate a new trap: 0xc000140090 27
D1021 20:39:11.874192 33296 usertrap_amd64.go:223] [ 1: 1] Apply the binary patch addr 7ef940314880 trap addr 62870 ([184 1 0 0 0 15 5] -> [255 36 37 112 40 6 0])
I1021 20:39:11.874203 33296 strace.go:567] [ 1: 1] nvidia-smi E write(0x1 host:[2], 0x708f20 "Failed to initialize NVML: GPU access blocked by the operating system\n", 0x46)
I1021 20:39:11.874241 33296 strace.go:605] [ 1: 1] nvidia-smi X write(0x1 host:[2], ..., 0x46) = 70 (0x46) (30.963µs)
I1021 20:39:11.874260 33296 strace.go:561] [ 1: 1] nvidia-smi E exit_group(0x11)
I1021 20:39:11.874275 33296 strace.go:599] [ 1: 1] nvidia-smi X exit_group(0x11) = 0 (0x0) (8.346µs)
D1021 20:39:11.874282 33296 task_exit.go:204] [ 1: 1] Transitioning from exit state TaskExitNone to TaskExitInitiated
D1021 20:39:11.875153 1 connection.go:127] sock read failed, closing connection: EOF
D1021 20:39:11.875201 1 connection.go:127] sock read failed, closing connection: EOF
D1021 20:39:11.875355 1 connection.go:127] sock read failed, closing connection: EOF
D1021 20:39:11.875442 1 connection.go:127] sock read failed, closing connection: EOF
D1021 20:39:11.875491 1 connection.go:127] sock read failed, closing connection: EOF
D1021 20:39:11.875568 1 connection.go:127] sock read failed, closing connection: EOF
I1021 20:39:11.875592 33296 loader.go:1215] Gofer socket disconnected, killing container "container"
D1021 20:39:11.875619 1 connection.go:127] sock read failed, closing connection: EOF
D1021 20:39:11.875670 33296 task_exit.go:361] [ 1: 1] Init process terminating, killing namespace
D1021 20:39:11.875708 33296 task_signals.go:481] [ 1: 1] No task notified of signal 9
D1021 20:39:11.875733 33296 task_exit.go:204] [ 1: 1] Transitioning from exit state TaskExitInitiated to TaskExitZombie
D1021 20:39:11.875745 33296 task_exit.go:204] [ 1: 1] Transitioning from exit state TaskExitZombie to TaskExitDead
D1021 20:39:11.875775 33296 controller.go:681] containerManager.Wait returned, cid: container, waitStatus: 0x1100, err:
I1021 20:39:11.875781 33296 boot.go:534] application exiting with exit status 17
D1021 20:39:11.875816 33296 urpc.go:571] urpc: successfully marshalled 39 bytes.
I1021 20:39:11.875844 33296 watchdog.go:221] Stopping watchdog
I1021 20:39:11.875858 33296 watchdog.go:225] Watchdog stopped
D1021 20:39:11.875884 33249 urpc.go:614] urpc: unmarshal success.
D1021 20:39:11.876068 1 connection.go:127] sock read failed, closing connection: EOF
I1021 20:39:11.876071 33296 main.go:222] Exiting with status: 4352
I1021 20:39:11.876133 1 gofer.go:341] All lisafs servers exited.
I1021 20:39:11.876198 1 main.go:222] Exiting with status: 0
D1021 20:39:11.878341 33249 container.go:790] Destroy container, cid: container
D1021 20:39:11.878395 33249 container.go:1087] Destroying container, cid: container
D1021 20:39:11.878404 33249 sandbox.go:1602] Destroying root container by destroying sandbox, cid: container
D1021 20:39:11.878412 33249 sandbox.go:1299] Destroying sandbox "container"
D1021 20:39:11.878425 33249 container.go:1101] Killing gofer for container, cid: container, PID: 33257
D1021 20:39:11.878453 33249 cgroup_v2.go:177] Deleting cgroup "/sys/fs/cgroup/container"
D1021 20:39:11.878474 33249 cgroup_v2.go:188] Removing cgroup for path="/sys/fs/cgroup/container"
I1021 20:39:11.878623 33249 main.go:222] Exiting with status: 4352

The text was updated successfully, but these errors were encountered:

ayushr2 · 2024-10-22T19:57:50Z

The provided OCI spec doesn't run with runc either. Here are a few problems with it:

The /sys mount is provided 2 times.
The prestart hook has insufficient arguments. The args should be ["nvidia-container-runtime-hook", "prestart"].
The mount namespace should have type mount, not mnt.

I would recommend you run gVisor with Docker (which you confirmed works), look copy the OCI spec generated by docker and work backwards to generating the desired OCI spec.

For the rootfs, instead of bind mounting the host /usr and /lib directories, maybe create the rootfs using docker export? https://gvisor.dev/docs/user_guide/quick_start/oci/

In trying to run this OCI spec, I ended up completely bricking my test VM to the point that I need to delete it now 😭
After running this container with runc, /usr/bin sorta disappeared from my host.

sfc-gh-lshi · 2024-10-22T23:20:36Z

Awesome, thanks a lot, that seems to work! 🎉

I'll open up a separate issue for the rootless case, since that still persists.

For completeness, here's what I did:

To get the OCI runtime spec

Update the Docker configuration for logging, as described in the docs.
Start a named container with Docker: sudo docker run --name gpu --runtime=runsc --gpus=all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi -L
The OCI runtime spec will be printed in the resulting log runsc.log.XXX.create.txt, you can search for the string ociVersion.

I made some edits to trim the configuration, ending up with:

{
  "ociVersion": "1.0.0",
  "process": {
    "user": {
      "uid": 0,
      "gid": 0
    },
    "args": [
      "nvidia-smi",
      "-L"
    ],
    "env": [
      "PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "HOSTNAME=1da1c41ed033",
      "NVARCH=x86_64",
      "NVIDIA_REQUIRE_CUDA=cuda\u003e=11.6 brand=tesla,driver\u003e=470,driver\u003c471 brand=unknown,driver\u003e=470,driver\u003c471 brand=nvidia,driver\u003e=470,driver\u003c471 brand=nvidiartx,driver\u003e=470,driver\u003c471 brand=geforce,driver\u003e=470,driver\u003c471 brand=geforcertx,driver\u003e=470,driver\u003c471 brand=quadro,driver\u003e=470,driver\u003c471 brand=quadrortx,driver\u003e=470,driver\u003c471 brand=titan,driver\u003e=470,driver\u003c471 brand=titanrtx,driver\u003e=470,driver\u003c471",
      "NV_CUDA_CUDART_VERSION=11.6.55-1",
      "NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-6",
      "CUDA_VERSION=11.6.2",
      "LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64",
      "NVIDIA_VISIBLE_DEVICES=all",
      "NVIDIA_DRIVER_CAPABILITIES=compute,utility",
      "NVIDIA_VISIBLE_DEVICES=all"
    ],
    "cwd": "/"
  },
  "root": {
    "path": "rootfs",
    "readonly": true
  },
  "mounts": [
    {
      "destination": "/proc",
      "type": "proc",
      "source": "proc",
      "options": [
        "nosuid",
        "noexec",
        "nodev"
      ]
    },
    {
      "destination": "/dev",
      "type": "tmpfs",
      "source": "tmpfs",
      "options": [
        "nosuid",
        "strictatime",
        "mode=755",
        "size=65536k"
      ]
    },
    {
      "destination": "/sys",
      "type": "sysfs",
      "source": "sysfs",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "ro"
      ]
    }
  ],
  "hooks": {
    "prestart": [
      {
        "path": "/usr/bin/nvidia-container-runtime-hook",
        "args": [
          "nvidia-container-runtime-hook",
          "prestart"
        ],
        "env": [
          "LANG=C.UTF-8",
          "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin"
        ]
      }
    ]
  },
  "linux": {
    "namespaces": [
      {
        "type": "mount"
      },
      {
        "type": "network"
      },
      {
        "type": "uts"
      },
      {
        "type": "pid"
      },
      {
        "type": "ipc"
      },
      {
        "type": "cgroup"
      }
    ]
  }
}

To start a container

Create a directory to work out of, e.g. mkdir ~/test && cd ~/test.
Create config.json and copy in the configuration above.
Export the filesystem of the gpu container.

docker export gpu | sudo tar -xf - -C rootfs --same-owner --same-permissions

Start the container

$ sudo runsc --nvproxy --strace --debug --debug-log=/tmp/logs/runsc.log --network=host --host-uds=all run "container"
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-c69b8a79-30ff-a0b3-37f1-9f34c9f1feb2)
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-dbd5315f-225c-bfc6-d7f2-e76beac8a6ce)

sfc-gh-lshi added the type: bug Something isn't working label Oct 21, 2024

ayushr2 added the area: gpu Issue related to sandboxed GPU access label Oct 22, 2024

sfc-gh-lshi closed this as completed Oct 22, 2024

sfc-gh-lshi mentioned this issue Oct 22, 2024

Starting GPU container with rootless runsc: operation not permitted #11076

Open

sfc-gh-lshi mentioned this issue Nov 8, 2024

Using nvidia-container-cli with rootless gVisor NVIDIA/libnvidia-container#288

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Starting container directly with runsc: GPU access blocked by operating system #11069

Starting container directly with runsc: GPU access blocked by operating system #11069

sfc-gh-lshi commented Oct 21, 2024 •

edited

Loading

ayushr2 commented Oct 22, 2024

sfc-gh-lshi commented Oct 22, 2024

Starting container directly with runsc: GPU access blocked by operating system #11069

Starting container directly with runsc: GPU access blocked by operating system #11069

Comments

sfc-gh-lshi commented Oct 21, 2024 • edited Loading

Description

Bonus question

Steps to reproduce

runsc version

docker version (if using docker)

uname

kubectl (if using Kubernetes)

repo state (if built from source)

runsc debug logs (if available)

ayushr2 commented Oct 22, 2024

sfc-gh-lshi commented Oct 22, 2024

To get the OCI runtime spec

To start a container

sfc-gh-lshi commented Oct 21, 2024 •

edited

Loading