Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize virtualization stack to different libvirt hypervisor-drivers #259

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

harshitgupta1337
Copy link

@harshitgupta1337 harshitgupta1337 commented Feb 12, 2024

This KubeVirt design proposal discusses how KubeVirt can be used to create libvirt virtual machines that are backed by diverse hypervisor drivers, such as QEMU/KVM, Xen, VirtualBox, etc. The aim of this proposal is to enumerate the design and implementation choices for enabling this multi-hypervisor support in KubeVirt.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note:


@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Feb 12, 2024
@kubevirt-bot
Copy link

Hi @harshitgupta1337. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubevirt-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fabiand for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alicefr
Copy link
Member

alicefr commented Feb 13, 2024

/cc

@kubevirt-bot kubevirt-bot requested a review from alicefr February 13, 2024 07:43
Copy link
Member

@fabiand fabiand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this proposal.

To me the main areas that need to look at:

  1. What is the complexity being added to the code
  2. What functionality will be impacted? And how can we set the user expectations right?
  3. Why do we need this and how will we organize ownership (we are doubling the test matrix) - who is "paying" for it?


Although KubeVirt currently relies on libvirt to create and manage virtual machine instances (VMIs), it relies specifically on the QEMU/KVM virtualization stack (VMM and hypervisor) to host the VMI. This limits KubeVirt from being used in settings where a different VMM or hypervisor is used.

In fact, libvirt itself is flexible enough to support a diverse set of VMMs and hypervisors. The libvirt API delegates its implementation to one or more internal drivers, dependending on the [connection URI](https://libvirt.org/uri.html) passed when initializing the library. The list of currently supported hypervisor drivers in Libvirt are:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct that libvirt is supporting many hypervisors.

However, the supported featureset accross all hypervisors (speak the subset of features) is actually much smaller.

This is why KubeVirt inteintionally had focused on KVM only in order to not consider the special cases of different hypevisors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in touch with the cloud-hypervisor community and they are actively working on achieving parity with qemu-kvm in terms of the VMI features offered by KubeVirt.


## Goals

KubeVirt should be able to offer a choice to its users over which libvirt hypervisor-driver they want to use to create their VMI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

This section is impotrant: Please provide a justification of how this will help KubeVirt users, or KubeVirt.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Microsoft is a vendor of KubeVirt as it leverages KubeVirt in its Azure Operator Nexus product as a VM orchestrator. The hypervisor currently used in the Nexus product is qemu-kvm, however, in the future MSFT is looking at alternative hypervisors such as cloud-hypervisor.
To continue using KubeVirt for this product it would make sense to make it hypervisor-agnostic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also another project called Virtink which was created to add K8s-based orchestration support for cloud-hypervisor based VMs.
https://github.com/smartxworks/virtink
This shows that there is a need for K8s-based orchestration for cloud-hypervisor VMs, and KubeVirt already interfaces with libvirt - which supports cloud-hypervisor driver.


## API Changes

Addition of a `vmi.hypervisor` field. Example of how a user could request a specific hypervisor as the underlying libvirt hypervisor-driver through the VMI spec:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VMI seems to be quite fine granular.

if,t hen shouldnÄt it be a cluster level setting?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am considering a scenario in which different cluster nodes could have a different virtualization-stack. In KubeVirt virt-handlers on different cluster nodes are independent, so IMO there is no reason to not set hypervisor at this fine granularity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for having different hypervisors in a single cluster?

There is also a cluster level impact, i.e. the overhead calculation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No specific reason. Based on my understanding of the KubeVirt code, the overhead calculation is for the virt-launcher pod alone, so one could (in theory) have diff virt-launcher pods with diff hypervisors running on the same cluster. However, I could be wrong, so please correct me.

I don't have a specific scenario in mind as of now that would require multiple hypervisors on the same cluster, but it is a more flexible design choice imo to have the hypervisor-specific logic limited to the components that are tied to specific nodes (i.e., virt-handler and virt-launcher).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it makes sense to allow multiple hypervisors in the same cluster. Different hypervisor could fit to different use cases and we could have a unified management.

@harshitgupta1337 harshitgupta1337 marked this pull request as ready for review February 13, 2024 19:36
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 13, 2024

## API Changes

Addition of a `vmi.hypervisor` field. Example of how a user could request a specific hypervisor as the underlying libvirt hypervisor-driver through the VMI spec:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it makes sense to allow multiple hypervisors in the same cluster. Different hypervisor could fit to different use cases and we could have a unified management.


```yaml
spec:
hypervisor: cloud-hypervisor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here it is a good fit for something similar to the kubernetes runtime classes. If we need additional configuration specific to the hypervisor they could go into a new CRD


- `virt-launcher` pod image should be specific to the `vmi.hypervisor`.

- Hypervisor resource needed by the `virt-launcher` pod. For instance, for a VMI with `hypervisor=qemu-kvm`, the corresponding virt-launcher pod requires the resource `devices.kubevirt.io/kvm: 1`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, take emulation also into account

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please expand on your comment?


- Hypervisor resource needed by the `virt-launcher` pod. For instance, for a VMI with `hypervisor=qemu-kvm`, the corresponding virt-launcher pod requires the resource `devices.kubevirt.io/kvm: 1`.

- The resource requirement of the virt-launcher pod should be adjusted (w.r.t. to the resource spec of the VMI), to take into account the resources consumed by the requested VMM daemon running in the `virt-launcher` pod. Currently, the field `VirtqemudOverhead` holds the memory overhead of the `virtqemud` process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this go in the new Hypervisor Runtime CRD?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea.

@alicefr
Copy link
Member

alicefr commented Feb 15, 2024

@harshitgupta1337 I think what we need here it is an infrastructure for hypervisor plugins. Putting everything into kubevirt code would make the code base very large and hard to maintain.
Do you think it would be possible to change the goal of this proposal and try to develop a plugin instead.
Cloud Hypervisor could be the first hypervisor plugin.

@alicefr
Copy link
Member

alicefr commented Feb 15, 2024

I think we need also a list of features supported by each specific hypervisor.

From the old proposal for Cloud Hypervisor integration, not all the features supported by QEMu are available in CH. We should have here also a similar section with the features as in the old proposal. Then, these options should be advertised in the new CRD if they are supported or not.

@harshitgupta1337
Copy link
Author

@harshitgupta1337 I think what we need here it is an infrastructure for hypervisor plugins. Putting everything into kubevirt code would make the code base very large and hard to maintain. Do you think it would be possible to change the goal of this proposal and try to develop a plugin instead. Cloud Hypervisor could be the first hypervisor plugin.

@alicefr You're referring to this feature in Golang, isn' it ?
https://pkg.go.dev/plugin
Then we would need 2 plugins, right? One for qemu for backwards compatibility and a new one for cloud-hypervisor?

@alicefr
Copy link
Member

alicefr commented Feb 16, 2024

@harshitgupta1337 I think what we need here it is an infrastructure for hypervisor plugins. Putting everything into kubevirt code would make the code base very large and hard to maintain. Do you think it would be possible to change the goal of this proposal and try to develop a plugin instead. Cloud Hypervisor could be the first hypervisor plugin.

@alicefr You're referring to this feature in Golang, isn' it ? https://pkg.go.dev/plugin Then we would need 2 plugins, right? One for qemu for backwards compatibility and a new one for cloud-hypervisor?

No, I'm referring more to an API where we can plug different hypervisors, like CRI for the container runtimes

@kubevirt-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 16, 2024
@iholder101
Copy link
Contributor

/cc
I haven't looked at the proposal yet, but wanted to point out that this proposal reminds me of a similar design proposal from the past: #184.

This proposal is not focused specifically on cloud-hypervisor, which I think is a step forward.
I think that reading the discussion there will be interesting and valuable, especially this comment. To summarize:

So we'd need a similar interface to CRI, but for hypervisor runtimes for kubevirt.

I think that it will definitely not be easy to design and implement something like that, especially since currently Kubevirt heavily relies on libvirt-specific features/APIs. Saying that, if you're willing to invest the effort to achieve it, I think it would be valuable and I will be happy to promote it.

@kubevirt-bot kubevirt-bot requested a review from iholder101 May 22, 2024 11:38
@stu-gott
Copy link
Member

/remove-lifecycle stale

@kubevirt-bot kubevirt-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2024
@kubevirt-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 20, 2024
@iholder101
Copy link
Contributor

@harshitgupta1337 are you still interested in this?

@harshitgupta1337
Copy link
Author

@iholder101 Yes I am working on addressing the suggestions mentioned earlier. I should be able to come up with an updated design by the end of this month.

@iholder101
Copy link
Contributor

@iholder101 Yes I am working on addressing the suggestions mentioned earlier. I should be able to come up with an updated design by the end of this month.

Great to hear that!
Good luck, let me know if I can help in any way.

/remove-lifecycle stale

@kubevirt-bot kubevirt-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 21, 2024
@harshitgupta1337
Copy link
Author

@harshitgupta1337 I think what we need here it is an infrastructure for hypervisor plugins. Putting everything into kubevirt code would make the code base very large and hard to maintain. Do you think it would be possible to change the goal of this proposal and try to develop a plugin instead. Cloud Hypervisor could be the first hypervisor plugin.

@alicefr You're referring to this feature in Golang, isn' it ? https://pkg.go.dev/plugin Then we would need 2 plugins, right? One for qemu for backwards compatibility and a new one for cloud-hypervisor?

No, I'm referring more to an API where we can plug different hypervisors, like CRI for the container runtimes

Hi @alicefr I am curious why a CRI-like interface is necessary in this scenario, given that libvirt provides a uniform abstraction for creating VMs?

@xpivarc
Copy link
Member

xpivarc commented Sep 2, 2024

@harshitgupta1337 In the code (virt-handler's one to be specific) you can find out that we directly configure qemu as well. So as such Libvirt does support multiple "runtimes" but since the libvirt is not privilege in Kubevirt, it can't do everything.

@harshitgupta1337
Copy link
Author

Open question: How to manage the features that are supported on one Virt Stack and not on another?

@kubevirt-bot
Copy link

Thanks for your pull request. Before we can look at it, you'll need to add a 'DCO signoff' to your commits.

📝 Please follow instructions in the contributing guide to update your commits with the DCO

Full details of the Developer Certificate of Origin can be found at developercertificate.org.

The list of commits missing DCO signoff:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@harshitgupta1337
Copy link
Author

harshitgupta1337 commented Sep 26, 2024

Hey @everyone, I have updated the design proposal with a couple of interfaces that I have added in my PoC that expose necessary functions for supporting both qemu-kvm and cloud-hypervisor.
@vladikr

@aburdenthehand
Copy link
Member

@xpivarc @fabiand @alicefr @iholder101
Please take a look at the updated proposal when you can

@iholder101
Copy link
Contributor

@xpivarc @fabiand @alicefr @iholder101 Please take a look at the updated proposal when you can

Sorry to chime in late on this one.

@harshitgupta1337 I appreciate your initiative and fundamentally support it.

However, being honest here, a huge amount of work is probably required in order for this to actually get in. The biggest challenges here, as I see them, are:

  • Currently most of our code assumes we use libvirt and QEMU. This assumption is spread around the entire codebase. The code would need to be heavily refactored in order to eliminate this assumption.
  • We have to keep Kubevirt maintainable. This means that this effort should not significantly hurt the development speed, ease or reliability. For example, some of the current and future features might not be supported in all hypervisors, which should not mean these features shouldn't be implemented.
  • Testability: we need to ensure we're able to test Kubevirt with multiple hypervisors. This is both a matter of writing these tests, but also paying for CI resources to test them, which someone should pay.

And this is just a very partial list :)
What I'm trying to say is: this is a drastically huge change which would demand an outstanding amount of work. I encourage you and others to pursue this but it's important to me to honestly reflect the challenges. I also tend to think that this work would be too much for one person, and would probably demand multiple people gather in a working group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: no Indicates the PR's author has not DCO signed all their commits. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants