Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] Support for Nvidia vGPU drivers #461

Open
insunaa opened this issue Sep 2, 2024 · 4 comments
Open

[Request] Support for Nvidia vGPU drivers #461

insunaa opened this issue Sep 2, 2024 · 4 comments

Comments

@insunaa
Copy link

insunaa commented Sep 2, 2024

Would it be possible to add support for the vGPU Guest Drivers? This would enable running Talos in a VM with an Nvidia vGPU that has been passed through by the hypervisor.

Nvidia themselves don't publish the vGPU drivers for download without a license, but at least the guest drivers can be downloaded freely from the Google Cloud Platform https://cloud.google.com/compute/docs/gpus/grid-drivers-table
I'm not sure if Google would be OK with adding their CDN to a CI/CD pipeline for Talos images, tho.

Please note that Nvidia vGPU drivers are not the same as Nvidia Enterprise GPU drivers. They are not interchangeable and have separate purposes. The currently included Nvidia Drivers do not work for this purpose.

@frezbo
Copy link
Member

frezbo commented Sep 2, 2024

image
seems google is not ok, the preferable solution would be to have a custom extension built and managed by an org that has license

@insunaa
Copy link
Author

insunaa commented Sep 2, 2024

I took that statement more as meaning that these drivers are provided by Google for use with Compute Engine, vs. other types of VMs that Google Cloud offer, basically a compatibility warning, less than a usage restriction, but I'm not a lawyer

@rothgar
Copy link
Member

rothgar commented Feb 5, 2025

I think the vGPU guest drivers are not gated on NVIDIA's site (I'm not sure if vendors have specific flavors of the driver) but this appears to be a general version of the driver we could use. https://www.nvidia.com/en-us/drivers/details/156511/

We'd also need to install and run the nvidia-gridd daemon and an API that allows users to configure /etc/nvidia/gridd.conf file as well as place a token into /etc/nvidia/ClientConfigToken https://docs.nvidia.com/vgpu/13.0/grid-licensing-user-guide/index.html#configuring-nls-licensed-client-on-linux

@jfroy
Copy link
Contributor

jfroy commented Feb 5, 2025

I can ask internally what distribution vendors are expected or encouraged to do.

GPU-Operator does support managing vGPU components but I haven't looked at the details (what the chart does, what the operator does). I should try to re-engage with my prototype to make more of the operator work on Talos out of the box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants