-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design-proposal: KubeVirt DRA design proposal #293
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,161 @@ | ||
# Overview | ||
This proposal is about adding supporting DRA (dynamic resource allocation) in KubeVirt. | ||
DRA allows vendors fine-grained control of devices. The device-plugin model will continue | ||
to exist in Kubernetes, but DRA will offer vendors more control over the device topology. | ||
|
||
## Motivation | ||
DRA adoption is important for KubeVirt so that vendors can expect the same | ||
control of their devices using Virtual Machines and Containers. | ||
|
||
## Goals | ||
- Align on how KubeVirt will consume external and in-tree DRA drivers | ||
- Align on what drivers KubeVirt will support in tree | ||
|
||
## Non Goals | ||
- Replace device-plugin support in KubeVirt | ||
|
||
## Definition of Users | ||
A user is a person that wants to attach a device to a VM | ||
|
||
## User Stories | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Additional usecase:
|
||
- As a user, I would like to use my GPU dra driver with KubeVirt | ||
- As a user, I would like to use KubeVirt's default driver | ||
|
||
## Repos | ||
kubevirt/kubevirt | ||
|
||
# Design | ||
|
||
## API Examples | ||
|
||
### VM API with PassThrough GPU | ||
|
||
``` | ||
apiVersion: resource.k8s.io/v1alpha2 | ||
kind: ResourceClass | ||
name: gpu.resource.nvidia.com | ||
driverName: gpu.resource.nvidia.com | ||
--- | ||
apiVersion: rtx4090.gpu.resource.nvidia.com/v1 | ||
kind: ClaimParameters | ||
name: rtx4090-claim-parameters | ||
spec: | ||
driver: vfio | ||
--- | ||
apiVersion: resource.k8s.io/v1alpha2 | ||
kind: ResourceClaimTemplate | ||
metadata: | ||
name: rtx4090-claim-template | ||
spec: | ||
spec: | ||
resourceClassName: gpu.resource.nvidia.com | ||
parametersRef: | ||
apiGroup: rtx4090.gpu.resource.nvidia.com/v1 | ||
kind: ClaimParameters | ||
name: rtx4090-claim-parameters | ||
--- | ||
apiVersion: kubevirt.io/v1 | ||
kind: VirtualMachine | ||
metadata: | ||
labels: | ||
kubevirt.io/vm: vm-cirros | ||
name: vm-cirros | ||
spec: | ||
running: false | ||
template: | ||
metadata: | ||
labels: | ||
kubevirt.io/vm: vm-cirros | ||
spec: | ||
resourceClaims: | ||
- name: rtx4090 | ||
source: | ||
resourceClaimTemplateName: rtx4090-claim-template | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want to reference a claim here or the claim template? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have to have both. |
||
–-- | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
name: virt-launcher-cirros | ||
spec: | ||
containers: | ||
- name: virt-launcher | ||
image: virt-launcher | ||
resources: | ||
claims: | ||
- name: rtx4090 | ||
resourceClaims: | ||
- name: rtx4090 | ||
source: | ||
resourceClaimTemplateName: rtx4090-claim-template | ||
``` | ||
|
||
### VM API with vGPU | ||
|
||
``` | ||
apiVersion: resource.k8s.io/v1alpha2 | ||
kind: ResourceClass | ||
name: gpu.resource.nvidia.com | ||
driverName: gpu.resource.nvidia.com | ||
--- | ||
apiVersion: a100.gpu.resource.nvidia.com/v1 | ||
kind: ClaimParameters | ||
name: a100-40C-claim-parameters | ||
spec: | ||
driver: nvidia | ||
profile: A100DX-40C # Maximum 2 40C vGPUs per GPU | ||
--- | ||
apiVersion: resource.k8s.io/v1alpha2 | ||
kind: ResourceClaimTemplate | ||
metadata: | ||
name: a100-40C-claim-template | ||
spec: | ||
spec: | ||
resourceClassName: gpu.resource.nvidia.com | ||
parametersRef: | ||
apiGroup: a100.gpu.resource.nvidia.com/v1 | ||
kind: ClaimParameters | ||
name: a100-40C-claim-parameters | ||
--- | ||
apiVersion: kubevirt.io/v1 | ||
kind: VirtualMachine | ||
metadata: | ||
labels: | ||
kubevirt.io/vm: vm-cirros | ||
name: vm-cirros | ||
spec: | ||
running: false | ||
template: | ||
metadata: | ||
labels: | ||
kubevirt.io/vm: vm-cirros | ||
spec: | ||
resourceClaims: | ||
- name: a100-40C | ||
source: | ||
resourceClaimTemplateName: a100-40C-claim-template | ||
Comment on lines
+131
to
+135
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Specifically for GPUs, this is the spec I had in mind to keep in-sync with the existing architecture and to address an usecase such as:
The idea is that, for a The same rationale can be applied to the other device types from https://pkg.go.dev/kubevirt.io/api/core/v1#DomainSpec that would need DRA integration |
||
–-- | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
name: virt-launcher-cirros | ||
spec: | ||
containers: | ||
- name: virt-launcher | ||
image: virt-launcher | ||
resources: | ||
claims: | ||
- name: a100-40C | ||
resourceClaims: | ||
- name: a100-40C | ||
source: | ||
resourceClaimTemplateName: a100-40C-claim-template | ||
``` | ||
|
||
# References | ||
|
||
- Structured parameters | ||
https://github.com/kubernetes/kubernetes/pull/123516 | ||
- Structured parameters KEP | ||
https://github.com/kubernetes/enhancements/issues/4381 | ||
- DRA | ||
https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, DRA vendors design their drivers using CDI with the idea in mind that the requesting pod is going to consume the device (say, GPU) directly. In existing KubeVirt atchitecture, virt-launcher pod has kept a minimal security profile where even though it is requesting the devices from device-plugins, it only gets partial access to the devices, just enough to be able to pass it along to the libvirt domain.
I'm wondering if it will break the security posture of KubeVirt if we introduce DRA drivers that allows full access to the devices from the virt-launcher pod.