Skip to content

Commit

Permalink
nvproxy: add ioctl NV_CONF_COMPUTE_CTRL_CMD_GPU_GET_KEY_ROTATION_STATE
Browse files Browse the repository at this point in the history
Hey,

this adds a missing ioctl required to run workloads on H100s with CC mode on.
I couldn't find the respective ioctl in any supported driver version prior to 550.90.07, hence I added it only to that version's ABI.

Without this patch the following example crashes:
```bash
$ docker run --runtime=runsc --gpus=all pytorch/pytorch:2.4.0-cuda12.4-cudnn9-runtime python -c "import torch; torch.cuda.init()"
```
The error is:
```
Traceback (most recent call last):
  File "/test.py", line 3, in <module>
    torch.cuda.init()
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 260, in init
    _lazy_init()
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
```

At the same time gvisor's debug logs show `nvproxy: unknown control command 0xcb33010c`.

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10824 from derpsteb:ob/key-rotation 960c2d0
PiperOrigin-RevId: 668003601
  • Loading branch information
derpsteb authored and gvisor-bot committed Aug 27, 2024
1 parent 945b418 commit d063345
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 1 deletion.
1 change: 1 addition & 0 deletions pkg/abi/nvgpu/ctrl.go
Original file line number Diff line number Diff line change
Expand Up @@ -561,4 +561,5 @@ const (
NV_CONF_COMPUTE_CTRL_CMD_SYSTEM_GET_CAPABILITIES = 0xcb330101
NV_CONF_COMPUTE_CTRL_CMD_SYSTEM_GET_GPUS_STATE = 0xcb330104
NV_CONF_COMPUTE_CTRL_CMD_GPU_GET_NUM_SECURE_CHANNELS = 0xcb33010b
NV_CONF_COMPUTE_CTRL_CMD_GPU_GET_KEY_ROTATION_STATE = 0xcb33010c
)
17 changes: 16 additions & 1 deletion pkg/sentry/devices/nvproxy/version.go
Original file line number Diff line number Diff line change
Expand Up @@ -673,7 +673,22 @@ func Init() {
_ = addDriverABI(550, 54, 14, "8c497ff1cfc7c310fb875149bc30faa4fd26d2237b2cba6cd2e8b0780157cfe3", v550_54_14)

v550_54_15 := addDriverABI(550, 54, 15, "2e859ae5f912a9a47aaa9b2d40a94a14f6f486b5d3b67c0ddf8b72c1c9650385", v550_54_14)
_ = addDriverABI(550, 90, 07, "51acf579d5a9884f573a1d3f522e7fafa5e7841e22a9cec0b4bbeae31b0b9733", v550_54_15)

v550_90_07 := func() *driverABI {
abi := v550_54_15()
abi.controlCmd[nvgpu.NV_CONF_COMPUTE_CTRL_CMD_GPU_GET_KEY_ROTATION_STATE] = rmControlSimple

prevNames := abi.getStructNames
abi.getStructNames = func() *driverStructNames {
names := prevNames()
names.controlNames[nvgpu.NV_CONF_COMPUTE_CTRL_CMD_GPU_GET_KEY_ROTATION_STATE] = simpleIoctl("NV_CONF_COMPUTE_CTRL_CMD_GPU_GET_KEY_ROTATION_STATE_PARAMS")

return names
}

return abi
}
_ = addDriverABI(550, 90, 07, "51acf579d5a9884f573a1d3f522e7fafa5e7841e22a9cec0b4bbeae31b0b9733", v550_90_07)
})
}

Expand Down

0 comments on commit d063345

Please sign in to comment.