-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: build nvidia open source kernel module #220
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this Containerfile because I didn't want this PR to be rewriting the workflow logic too. If we wanted to use the same Containerfile, we would need to rework the GHA jobs to supply additional build args only for Nvidia.
"For cutting-edge platforms such as NVIDIA Grace Hopper or NVIDIA Blackwell, you must use the open-source GPU kernel modules. The proprietary drivers are unsupported on these platforms. For newer GPUs from the Turing, Ampere, Ada Lovelace, or Hopper architectures, NVIDIA recommends switching to the open-source GPU kernel modules." I think we should invert this -- default to nvidia-open with nvidia-closed being the edge case for older GPUs as that is what Nvidia is pushing for. This wouldn't affect this PR per-se, as the default choice would be set in the downstream image builds. |
|
||
akmods --force --kernels "${KERNEL_VERSION}" --kmod "nvidia" | ||
|
||
modinfo /usr/lib/modules/${KERNEL_VERSION}/extra/nvidia/nvidia{,-drm,-modeset,-peermem,-uvm}.ko.xz > /dev/null || \ | ||
(cat /var/cache/akmods/nvidia/${NVIDIA_AKMOD_VERSION}-for-${KERNEL_VERSION}.failed.log && exit 1) | ||
|
||
# View license information | ||
modinfo -l /usr/lib/modules/${KERNEL_VERSION}/extra/nvidia/nvidia{,-drm,-modeset,-peermem,-uvm}.ko.xz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this lists it out, can we do a condition check to make sure that the correct licensed kmod was built given the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we requested to do a check here but we didn't?
@@ -0,0 +1,63 @@ | |||
### | |||
### Containerfile.nvidia - used to build ONLY NVIDIA kmods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just symlink this instead of it's going to just be an exact copy of the Nvidia containerfile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now it's separate but it could be an if in the containerfile
LGTM! (Can't approve my own PR) |
Enable the open kernel module builds.
This should not affect anything downstream. Just gets us ready for the switch next Nvidia driver release.
Please note: I have not yet tried booting into an image with this driver.