-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRA driver does not pick up all GPUs on the node #32
Comments
Hmm, this is unexpected. We just pull the MIG state directly from NVML (the underlying library that Is it possible that the plugin came online when only one was enabled and the other wasn't? The plugin doesn't do any real-time reconciliation the GPU state -- the only way to get it to update is to restart the plugin. So can you try restarting the plugin? And ff that doesn't work, can you try deleting the NAS object and then restarting the plugin? This shouldn't be necessary, but I'm curious if it then resolves the issue or not. |
Thanks, I will delete the cluster and create plus reinstall the dra driver. |
update: recreated KinD cluster and re-deployed the previously built driver image, but still no luck:
|
Could you confirm that running |
Thanks, could you please recommend the container image that I should use to run the command? |
Running:
shows the kind nodes created by the demo. Running:
is equivalent to running |
Thank you for sharing the command, below is the command output:
As seen nvidia-smi and nas do not agree. one thing to note is that |
Can this be closed? |
I have enabled MIG mode on both GPUs on a single node but the nas object shows one of the GPUs is not mig enabled:
output of nvidia-smi:
Can you please share how can nas object be updated correctly?
The text was updated successfully, but these errors were encountered: