-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVlink support #174
Comments
@klueska do you have any thoughts around this? e.g. run |
I'm also curious about if/how DRA would support this scenario or similar use cases, e.g.
Partitioning an IB network seems in some ways similar to TPU slices, which were discussed at kubecon briefly along with partitionable devices kubernetes/enhancements#4874 -- curious if that kind of approach might make sense for GPUs but as a soft rather than hard constraint? it's possible to do this today with soft affinities but there's a fragmentation issue it would be nice for the scheduler to try to solve (best effort allocations never necessarily align the network topology 100%, I'm not sure DRA can do much better but figured I'd throw it out) |
This might be helpful to you #97 (comment) |
Is there any plans for adding support for NVLink? e.g. GB200 NVL72
If so, can you share a rough example for what a typical device class and ResourceClaimTemplate might look like? Thanks!
The text was updated successfully, but these errors were encountered: