-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvproxy: add ioctl NV_CONF_COMPUTE_CTRL_CMD_GPU_GET_KEY_ROTATION_STATE
#10824
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Otto Bittner <[email protected]>
derpsteb
force-pushed
the
ob/key-rotation
branch
from
August 27, 2024 09:44
4533462
to
960c2d0
Compare
So adding it in |
ayushr2
approved these changes
Aug 27, 2024
copybara-service bot
pushed a commit
that referenced
this pull request
Aug 27, 2024
Hey, this adds a missing ioctl required to run workloads on H100s with CC mode on. I couldn't find the respective ioctl in any supported driver version prior to 550.90.07, hence I added it only to that version's ABI. Without this patch the following example crashes: ```bash $ docker run --runtime=runsc --gpus=all pytorch/pytorch:2.4.0-cuda12.4-cudnn9-runtime python -c "import torch; torch.cuda.init()" ``` The error is: ``` Traceback (most recent call last): File "/test.py", line 3, in <module> torch.cuda.init() File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 260, in init _lazy_init() File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init torch._C._cuda_init() RuntimeError: No CUDA GPUs are available ``` At the same time gvisor's debug logs show `nvproxy: unknown control command 0xcb33010c`. FUTURE_COPYBARA_INTEGRATE_REVIEW=#10824 from derpsteb:ob/key-rotation 960c2d0 PiperOrigin-RevId: 668003601
copybara-service bot
pushed a commit
that referenced
this pull request
Aug 27, 2024
Hey, this adds a missing ioctl required to run workloads on H100s with CC mode on. I couldn't find the respective ioctl in any supported driver version prior to 550.90.07, hence I added it only to that version's ABI. Without this patch the following example crashes: ```bash $ docker run --runtime=runsc --gpus=all pytorch/pytorch:2.4.0-cuda12.4-cudnn9-runtime python -c "import torch; torch.cuda.init()" ``` The error is: ``` Traceback (most recent call last): File "/test.py", line 3, in <module> torch.cuda.init() File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 260, in init _lazy_init() File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init torch._C._cuda_init() RuntimeError: No CUDA GPUs are available ``` At the same time gvisor's debug logs show `nvproxy: unknown control command 0xcb33010c`. FUTURE_COPYBARA_INTEGRATE_REVIEW=#10824 from derpsteb:ob/key-rotation 960c2d0 PiperOrigin-RevId: 668003601
copybara-service bot
pushed a commit
that referenced
this pull request
Aug 27, 2024
Hey, this adds a missing ioctl required to run workloads on H100s with CC mode on. I couldn't find the respective ioctl in any supported driver version prior to 550.90.07, hence I added it only to that version's ABI. Without this patch the following example crashes: ```bash $ docker run --runtime=runsc --gpus=all pytorch/pytorch:2.4.0-cuda12.4-cudnn9-runtime python -c "import torch; torch.cuda.init()" ``` The error is: ``` Traceback (most recent call last): File "/test.py", line 3, in <module> torch.cuda.init() File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 260, in init _lazy_init() File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init torch._C._cuda_init() RuntimeError: No CUDA GPUs are available ``` At the same time gvisor's debug logs show `nvproxy: unknown control command 0xcb33010c`. FUTURE_COPYBARA_INTEGRATE_REVIEW=#10824 from derpsteb:ob/key-rotation 960c2d0 PiperOrigin-RevId: 668003601
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hey,
this adds a missing ioctl required to run workloads on H100s with CC mode on.
I couldn't find the respective ioctl in any supported driver version prior to 550.90.07, hence I added it only to that version's ABI.
Without this patch the following example crashes:
$ docker run --runtime=runsc --gpus=all pytorch/pytorch:2.4.0-cuda12.4-cudnn9-runtime python -c "import torch; torch.cuda.init()"
The error is:
At the same time gvisor's debug logs show
nvproxy: unknown control command 0xcb33010c
.