Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

design-proposals: Cloud Hypervisor integration #184

Closed
wants to merge 1 commit into from

Conversation

sboeuf
Copy link

@sboeuf sboeuf commented Jul 29, 2022

Submitting a design proposal to the KubeVirt community describing how we
could integrate the Cloud Hypervisor VMM with KubeVirt.

@kubevirt-bot kubevirt-bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Jul 29, 2022
@kubevirt-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign rmohr after the PR has been reviewed.
You can assign the PR to them by writing /assign @rmohr in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 29, 2022
@kubevirt-bot
Copy link

Hi @sboeuf. Thanks for your PR.

I'm waiting for a kubevirt member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@maiqueb
Copy link
Contributor

maiqueb commented Jul 29, 2022

/cc

@kubevirt-bot kubevirt-bot requested a review from maiqueb July 29, 2022 13:44
@iholder101
Copy link
Contributor

/cc

@kubevirt-bot kubevirt-bot requested a review from iholder101 July 31, 2022 12:12
@alicefr
Copy link
Member

alicefr commented Aug 1, 2022

/cc

@kubevirt-bot kubevirt-bot requested a review from alicefr August 1, 2022 07:33
@alicefr
Copy link
Member

alicefr commented Aug 1, 2022

/ok-to-test

@kubevirt-bot kubevirt-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 1, 2022
Submitting a design proposal to the KubeVirt community describing how we
could integrate the Cloud Hypervisor VMM with KubeVirt.

Signed-off-by: Sebastien Boeuf <[email protected]>
@sboeuf sboeuf force-pushed the ch_design_proposal branch from 68ead09 to 7818080 Compare August 5, 2022 14:33
@sboeuf
Copy link
Author

sboeuf commented Aug 5, 2022

Copy link
Contributor

@jean-edouard jean-edouard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, great design proposal!

Maybe it has been discussed already, but why can't we have libvirt drive cloud-hypervisor? Last time I checked they had basic/experimental support for it, but we could work with them to implement whatever is missing. That would dramatically reduce the scope of this integration project, and therefore the amount of code we have to maintain long term in KubeVirt.

Also, the use of the word "hypervisor" here doesn't make much sense to me, but according to wikipedia, "the contemporary usage" of the word is for VMMs... Oh well!

@sboeuf
Copy link
Author

sboeuf commented Aug 8, 2022

Thank you, great design proposal!

Thanks!

Maybe it has been discussed already, but why can't we have libvirt drive cloud-hypervisor? Last time I checked they had basic/experimental support for it, but we could work with them to implement whatever is missing. That would dramatically reduce the scope of this integration project, and therefore the amount of code we have to maintain long term in KubeVirt.

The problem with libvirt is that it's introducing yet another layer that's not needed in this situation since KubeVirt is the management layer already. With libvirt in the picture, we end up with KubeVirt layer managing the libvirt layer managing the VMM. It's complex and it requires more resources per Pod. And last thing about libvirt, it's very much QEMU specific, and I don't think it's really meant for managing other VMMs than QEMU (even if that's what it claims).

Also, the use of the word "hypervisor" here doesn't make much sense to me, but according to wikipedia, "the contemporary usage" of the word is for VMMs... Oh well!

Yes I agree the naming of the Cloud Hypervisor project is not ideal given it is a VMM :-/

Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your work here is impressive. Here are a few thoughts on how I think we could make this practical.

One issue we have with maintaining KubeVirt is how vast it has become. This is similar in concept to the limitations the Kubernetes Core itself hit when which motivated the community to create the CRD abstraction and other well defined "plugin" like abstractions.

I think we're bumping into those same limitations here when it comes to expanding the hypervisort type into the core project.

Here's what I'd like to be investigated. What interfaces would KubeVirt need to provide in order for you to successfully integrate your custom virt-laucher and hypervisor outside of the KubeVirt Core?

For example, Is this as simple as allowing using a custom virt-launcher binary and custom validation webhooks to ensure only supported features are used with the custom launcher?

@sboeuf
Copy link
Author

sboeuf commented Aug 16, 2022

Your work here is impressive. Here are a few thoughts on how I think we could make this practical.

One issue we have with maintaining KubeVirt is how vast it has become. This is similar in concept to the limitations the Kubernetes Core itself hit when which motivated the community to create the CRD abstraction and other well defined "plugin" like abstractions.

I think we're bumping into those same limitations here when it comes to expanding the hypervisort type into the core project.

Here's what I'd like to be investigated. What interfaces would KubeVirt need to provide in order for you to successfully integrate your custom virt-laucher and hypervisor outside of the KubeVirt Core?

For example, Is this as simple as allowing using a custom virt-launcher binary and custom validation webhooks to ensure only supported features are used with the custom launcher?

Yes it is as simple as that. The virt-launcher is the right abstraction layer as it simply allows one VMM or another to be used. That's basically what the PoC shows, as I've replaced the current virt-launcher implementation with the Cloud Hypervisor's one. There's no change in KubeVirt Core design, and this is showing that KubeVirt is fairly well defined to support more than one VMM.

@davidvossel
Copy link
Member

Yes it is as simple as that. The virt-launcher is the right abstraction layer as it simply allows one VMM or another to be used. That's basically what the PoC shows, as I've replaced the current virt-launcher implementation with the Cloud Hypervisor's one. There's no change in KubeVirt Core design, and this is showing that KubeVirt is fairly well defined to support more than one VMM.

If this is the case, is there anything that prevents the cloud hypervisor support from being developed externally of the kubevirt/kubevirt repo?

for example, we could create a new repo in the kubevirt org (maybe something like kubevirt/cloud-hypervisor-addon) which transforms an existing kubevirt install to use cloud hypervisor instead of libvirt/qemu-kvm.

@rmohr
Copy link
Member

rmohr commented Aug 16, 2022

Yes it is as simple as that. The virt-launcher is the right abstraction layer as it simply allows one VMM or another to be used. That's basically what the PoC shows, as I've replaced the current virt-launcher implementation with the Cloud Hypervisor's one. There's no change in KubeVirt Core design, and this is showing that KubeVirt is fairly well defined to support more than one VMM.

If this is the case, is there anything that prevents the cloud hypervisor support from being developed externally of the kubevirt/kubevirt repo?

for example, we could create a new repo in the kubevirt org (maybe something like kubevirt/cloud-hypervisor-addon) which transforms an existing kubevirt install to use cloud hypervisor instead of libvirt/qemu-kvm.

I would love to see a second implementation in kubevirt/kubevirt at least initially, just to have a validation implementation which ensures that we don't break the interface too easily. I don't think that we need to test it exhaustively (like just latest k8s should b e good enough). Cloud-hypervisure due to its single-static-binary delivery may be a pretty nice and easy validation platform next to qemu.

In general I would agree that we don't want 10 implementations in the core repo.

@sboeuf
Copy link
Author

sboeuf commented Aug 16, 2022

Yes it is as simple as that. The virt-launcher is the right abstraction layer as it simply allows one VMM or another to be used. That's basically what the PoC shows, as I've replaced the current virt-launcher implementation with the Cloud Hypervisor's one. There's no change in KubeVirt Core design, and this is showing that KubeVirt is fairly well defined to support more than one VMM.

If this is the case, is there anything that prevents the cloud hypervisor support from being developed externally of the kubevirt/kubevirt repo?

for example, we could create a new repo in the kubevirt org (maybe something like kubevirt/cloud-hypervisor-addon) which transforms an existing kubevirt install to use cloud hypervisor instead of libvirt/qemu-kvm.

Well if we were going down this road, then the libvirt implementation should also be moved to an addon model.
Also, having the Cloud Hypervisor integration in a different repo would imply more complex CI. I mean think about a breaking change in the communication between virt-launcher and virt-handler, or the modification of a structure used by the virt-launcher in general, I would expect the contributor making this change to update both libvirt and Cloud Hypervisor implementation to avoid any regression. This would be much harder to check with the code living in a separate repo.

@vladikr
Copy link
Member

vladikr commented Aug 16, 2022

Yes it is as simple as that. The virt-launcher is the right abstraction layer as it simply allows one VMM or another to be used. That's basically what the PoC shows, as I've replaced the current virt-launcher implementation with the Cloud Hypervisor's one. There's no change in KubeVirt Core design, and this is showing that KubeVirt is fairly well defined to support more than one VMM.

If this is the case, is there anything that prevents the cloud hypervisor support from being developed externally of the kubevirt/kubevirt repo?
for example, we could create a new repo in the kubevirt org (maybe something like kubevirt/cloud-hypervisor-addon) which transforms an existing kubevirt install to use cloud hypervisor instead of libvirt/qemu-kvm.

I would love to see a second implementation in kubevirt/kubevirt at least initially, just to have a validation implementation which ensures that we don't break the interface too easily. I don't think that we need to test it exhaustively (like just latest k8s should b e good enough). Cloud-hypervisure due to its single-static-binary delivery may be a pretty nice and easy validation platform next to qemu.

In general I would agree that we don't want 10 implementations in the core repo.

In general, I'm not against it, but I don't see how can we get away with only making changes to the virt-launcher and not the virt-handler.

What I am missing in this proposal is a mechanism of exposure and negotiation of each virt-launchers' capabilities.
Certain features in KubeVirt's API are dependent on promises that we get from libvirt, like a stable PCI layout / ABI stability, and/or on the knowledge of how certain features work in qemu.
We would need to reject the unsupported vmi specs during the creation and behave differently during upgrades (i.e live migration).

There are also implications on the virt-handler that are not covered in the proposal.

  • VFIO requires guest memory locking
  • calculating guest memory overhead differently.
  • Prometheus is still querying libvirt
  • virt-handler still relies on domain.metadata in some cases.

There are probably many other cases that I'm missing.

@vladikr
Copy link
Member

vladikr commented Aug 16, 2022

I mean think about a breaking change in the communication between virt-launcher and virt-handler, or the modification of a structure used by the virt-launcher in general, I would expect the contributor making this change to update both libvirt and Cloud Hypervisor implementation to avoid any regression. This would be much harder to check with the code living in a separate repo.

This is my concern too. However, we cannot and should not expect contributors to have the knowledge of fixing issues in both libvirt/qemu and cloud-hypervisor based virt-launchers.
Imagine that a core feature would have to be reworked due to a change in one of KubeVirt's dependencies. This change also somehow breaks the cloud hypervisor but the contributor doesn't know how to fix the issue.
The whole project will get into a deadlock.

I do believe that we need to provide a safe environment for the could hypervisor to develop and mature. Perhaps we need to have an incubation repo as other projects have.

@sboeuf
Copy link
Author

sboeuf commented Aug 17, 2022

Yes it is as simple as that. The virt-launcher is the right abstraction layer as it simply allows one VMM or another to be used. That's basically what the PoC shows, as I've replaced the current virt-launcher implementation with the Cloud Hypervisor's one. There's no change in KubeVirt Core design, and this is showing that KubeVirt is fairly well defined to support more than one VMM.

If this is the case, is there anything that prevents the cloud hypervisor support from being developed externally of the kubevirt/kubevirt repo?
for example, we could create a new repo in the kubevirt org (maybe something like kubevirt/cloud-hypervisor-addon) which transforms an existing kubevirt install to use cloud hypervisor instead of libvirt/qemu-kvm.

I would love to see a second implementation in kubevirt/kubevirt at least initially, just to have a validation implementation which ensures that we don't break the interface too easily. I don't think that we need to test it exhaustively (like just latest k8s should b e good enough). Cloud-hypervisure due to its single-static-binary delivery may be a pretty nice and easy validation platform next to qemu.
In general I would agree that we don't want 10 implementations in the core repo.

In general, I'm not against it, but I don't see how can we get away with only making changes to the virt-launcher and not the virt-handler.

virt-handler will have to be modified only for the mechanism spawning virt-launcher or ch-launcher, depending on the hypervisor type found through the VMI spec.

What I am missing in this proposal is a mechanism of exposure and negotiation of each virt-launchers' capabilities. Certain features in KubeVirt's API are dependent on promises that we get from libvirt, like a stable PCI layout / ABI stability, and/or on the knowledge of how certain features work in qemu. We would need to reject the unsupported vmi specs during the creation and behave differently during upgrades (i.e live migration).

The mechanism of exposure is directly through the VMI spec, and the negotiation would be performed by rejecting anything that is not supported by the Cloud Hypervisor implementation. The creation of the Pod would fail on such rejection.

There are also implications on the virt-handler that are not covered in the proposal.

  • VFIO requires guest memory locking

Can you elaborate on that? I mean if virt-handler is in charge of preparing the VFIO device, this wouldn't change at all. The virt-launcher implementation would adapt to what is provided and has been prepared.

  • calculating guest memory overhead differently.

Why does it need to be different? And again, it is up to the virt-launcher implementation to adapt. Unless I'm missing the point.

  • Prometheus is still querying libvirt

Do you mean it directly talks to the libvirt daemon? If that's the case, this is clearly an area for improvement.

  • virt-handler still relies on domain.metadata in some cases.

Sure and that's fine, as I'm not planning on replacing what's already there.

There are probably many other cases that I'm missing.

@sboeuf
Copy link
Author

sboeuf commented Aug 17, 2022

I mean think about a breaking change in the communication between virt-launcher and virt-handler, or the modification of a structure used by the virt-launcher in general, I would expect the contributor making this change to update both libvirt and Cloud Hypervisor implementation to avoid any regression. This would be much harder to check with the code living in a separate repo.

This is my concern too. However, we cannot and should not expect contributors to have the knowledge of fixing issues in both libvirt/qemu and cloud-hypervisor based virt-launchers. Imagine that a core feature would have to be reworked due to a change in one of KubeVirt's dependencies. This change also somehow breaks the cloud hypervisor but the contributor doesn't know how to fix the issue. The whole project will get into a deadlock.

Interesting, I actually disagree with that. And that's actually why I think it's important to keep things in the core project, so that the CI can catch a regression that would be caused by a change/refactoring in the code shared by both implementations.

I do believe that we need to provide a safe environment for the could hypervisor to develop and mature. Perhaps we need to have an incubation repo as other projects have.

Sure, I'm not opposed to that, as long as the end goal is to promote the code into the core project just like libvirt is.

@davidvossel
Copy link
Member

Sure, I'm not opposed to that, as long as the end goal is to promote the code into the core project just like libvirt is.

I'm hesitant to say with any certainty this is the end goal.

And that's actually why I think it's important to keep things in the core project, so that the CI can catch a regression that would be caused by a change/refactoring in the code shared by both implementations.

This is actually my concern. Given the current critical mass of expertise within the community, I don't know if we can practically support cloud hypervisor in the core repo. simply maintaining libvirt/qemu as that stack continues to evolve has been challenging.

I agree that having cloud hypervisor in the core repo would be the best thing for cloud hypervisor support, but given our current CI battles and community expertise, i don't think we can handle this without negatively impacting the community as a whole.

This is why I was trying to redirect the conversation into how we can maintain cloud hypervisor support in a separate repo.

What I think is more achievable (at least right now) is to begin versioning all communication between handler/launcher in order to create a consistent interface between the two components. The GRPC channels are already versioned, but there are other communication channels which are not, such as the method used between handler and launcher to mount hotplug/containerdisks, how handler helps setup network interfaces, and the locations of unix sockets just to name a few. If we want plugable launchers, we'd need to map all the interactions between handler/launcher and ensure backwards compatibility.

So, my thought process here is if we tighten up our handler/launcher interactions, then perhaps it is be practical to support cloud hypervisor in another repo

@sboeuf
Copy link
Author

sboeuf commented Aug 25, 2022

Sure, I'm not opposed to that, as long as the end goal is to promote the code into the core project just like libvirt is.

I'm hesitant to say with any certainty this is the end goal.

And that's actually why I think it's important to keep things in the core project, so that the CI can catch a regression that would be caused by a change/refactoring in the code shared by both implementations.

This is actually my concern. Given the current critical mass of expertise within the community, I don't know if we can practically support cloud hypervisor in the core repo. simply maintaining libvirt/qemu as that stack continues to evolve has been challenging.

The problem you're referring to is not related to Cloud Hypervisor. Of course maintainers can't have all the knowledge, and this is why they must be assigned specific domains of expertise. The whole review of a complex PR can be shared across multiple persons, that is totally acceptable IMO.

I agree that having cloud hypervisor in the core repo would be the best thing for cloud hypervisor support, but given our current CI battles and community expertise, i don't think we can handle this without negatively impacting the community as a whole.

Just to clarify, I would take full responsibility of reviewing PRs affecting Cloud Hypervisor and I would provide the expertise related to this field. I'm very worried that we're making a choice here based on logistical issues.

This is why I was trying to redirect the conversation into how we can maintain cloud hypervisor support in a separate repo.

Sure but it feels like we should include libvirt/QEMU in the discussion then. It would be a bit unfair to have libvirt part of KubeVirt core, and not Cloud Hypervisor.

What I think is more achievable (at least right now) is to begin versioning all communication between handler/launcher in order to create a consistent interface between the two components. The GRPC channels are already versioned, but there are other communication channels which are not, such as the method used between handler and launcher to mount hotplug/containerdisks, how handler helps setup network interfaces, and the locations of unix sockets just to name a few. If we want plugable launchers, we'd need to map all the interactions between handler/launcher and ensure backwards compatibility.

So, my thought process here is if we tighten up our handler/launcher interactions, then perhaps it is be practical to support cloud hypervisor in another repo

Wow, this a whole other level of work you're proposing here. I'm not sure I would have the time to dive into such work, and I don't think I have enough knowledge of KubeVirt to take care of such task.
It feels like we're diverging a lot from the initial proposal, as you're gating the approval for Cloud Hypervisor integration based on the ability for KubeVirt to ensure backwards compatibility between virt-handler and virt-launcher.

A lot of what I'm hearing here is about some existing issues in the KubeVirt community/CI, and it feels there's a lot to solve before even considering another VMM like Cloud Hypervisor to be integrated. TBH, it's overwhelming given the integration of Cloud Hypervisor would already require a considerable amount of work...

@davidvossel
Copy link
Member

TBH, it's overwhelming given the integration of Cloud Hypervisor would already require a considerable amount of work...

I understand. My goal certainly isn't to discourage you. I want to state accurately that the bar is high for what it would take to transform KubeVirt into a project that interact with anything other than qemu/kvm. Your PoC is impressive and has proven what is technically possible. The next step is to transform that into something that's supportable within the constrains of our community.

I believe a path forward would be to begin discussing what the equivalent of the k8s CRI for container runtimes would look like for kubevirt... So we'd need a similar interface to CRI, but for hypervisor runtimes for kubevirt. Also, at the controller level we'd need some sort of plugin for constructing the VMI's pod as well that understands hypervisor nuances (like the memory overhead calculations for pod requests/limits).

This is only my stance though. I'd like to hear more from @rmohr @vladikr @fabiand and others.

@wllenyj
Copy link

wllenyj commented Oct 24, 2022

Is there any progress on this proposal? I'm very interested in KubeVirt supporting different Hypervisors. This will make KuveVirt have better performance and lower overhead in some scenarios.

@kubevirt-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2023
@kubevirt-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

@kubevirt-bot kubevirt-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 21, 2023
@kubevirt-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

@kubevirt-bot
Copy link

@kubevirt-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wavezhang
Copy link

/reopen

@kubevirt-bot
Copy link

@wavezhang: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@iholder101
Copy link
Contributor

/reopen

Hey @wavezhang!

Are you interested to continue this effort?

Please make sure to read the comment above about the need to create an interface, similar to k8s CRI, but between kubevirt and hypervisor runtimes.

If you want to push this subject forward, I'll be happy to help paving the path forward according to the discussion above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.