-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOC] RMM hypervisor incompatibility advisory for managed pools? #652
Comments
@lmeyerov this is documented here: https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#cuda-open-cl-support-vgpu #656 adds a note to the RMM readme that |
Thank you for confirming and documenting so quickly, this gives us time over the weekend to get ahead of it for a difficult deployment group! |
@harrism FYI we pushed a patch and tried |
|
From the thread, it seems the above is not an RMM issue. |
Yes, thanks @harrism. The docs fix seems fine for guiding folks to writing portable rmm code. We hit other failure modes around vGPUs. Ex:, CUDA is disabled when the nvidia license is type = 0 (dev / unlicensed mode), which threw our users for 1-2w. In theory, py cuda libs can try to give better error messages, but probably not worth it, and more of a job for lower levels. |
Report incorrect documentation
Location of incorrect documentation
Potentially anywhere above/below RMM: rapids.ai setup pages, rmm modules, nvidia hypervisor advisories, ...
Describe the problems or issues found in the documentation
In a discussion with @kkraus14 , it sounded that RMM
managed
is expected to fail on hypervisor / vmware setups. I didn't see public docs on this on rapids/cudf/rmm side, nor in the hypervisor advisories. I'm simultaneously trying to figure out what the issue is, including scope/workarounds, and helping a big enterprise trying to adopt RAPIDS-stack in a tough env to plan around that. Hopefully the issue rings a bell and we can help save stress for other devs+users as well.Steps taken to verify documentation is incorrect
Filed this issue. We don't have a vmware testlab, so this is a surprise and currently tricky to understand on our end.
Suggested fix for documentation
Report needed documentation
Report needed documentation
A clear and concise description of what documentation is needed and why.
Steps taken to search for needed documentation
List any steps you have taken.
The text was updated successfully, but these errors were encountered: