Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

design-proposals: Cloud Hypervisor integration #184

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
295 changes: 295 additions & 0 deletions design-proposals/cloud-hypervisor-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,295 @@
# Overview

[Cloud Hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor) is a
Virtual Machine Monitor meant for running modern Cloud workloads. It is written
in Rust and free of legacy devices to provide a smaller attack surface to the
guest, which makes it a more secure alternative when it comes to running virtual
machines.

This document describes a design proposal for integrating Cloud Hypervisor with
KubeVirt, providing KubeVirt's users the possibility to rely on Cloud Hypervisor
to create virtual machines as an alternative to the default libvirt/QEMU.

## Motivation

Since Cloud Hypervisor aims at running virtual machines more securely, it is
important to offer KubeVirt's user this choice.

Cloud Hypervisor has been designed for Cloud workloads, which makes it perfectly
fit for the Cloud Native ecosystem, and that is the reason why it is already
integrated as part of the Kata Containers project.

To extend its overall Cloud Native support, it seems logical to integrate it
with KubeVirt.

One other reason for going through this effort is to identify if the abstraction
layers are correctly defined to support another VMM. This will help improve the
existing code by defining cleaner interfaces if needed.

## Goals

Provide users a way to choose Cloud Hypervisor over libvirt/QEMU to run their
virtual machines.

## Non Goals

Support all features available through KubeVirt.

Since Cloud Hypervisor has a much narrower scope than libvirt/QEMU, it doesn't
support as many features. Therefore, we can only expect a subset of KubeVirt's
features to be supported by Cloud Hypervisor.

## Definition of Users

This feature is directed at KubeVirt's users who want to run virtual machine
more securely by choosing Cloud Hypervisor over libvirt/QEMU.

## User Stories

A user recently tried Cloud Hypervisor and wants to use it for running virtual
machines on his Kubernetes/KubeVirt cluster.

## Repos

- [KubeVirt](https://github.com/kubevirt/kubevirt)

# Design

Looking at KubeVirt's architecture, each `virt-launcher` instance manages a
single pod. This is the abstraction layer we need to replace so that instead of
using `libvirt` to spawn QEMU's virtual machines, it will run and communicate
with Cloud Hypervisor directly.

A new launcher `ch-launcher` will be created so that it fully replaces the
existing `virt-launcher` component when needed.

## API Examples

Example of how a user could request Cloud Hypervisor as the underlying
hypervisor through the VMI spec:

```yaml
spec:
hypervisor: cloud-hypervisor
```

- Introduction of a new field `Hypervisor` in `VirtualMachineInstanceSpec`
- By default if no `Hypervisor` is provided, it would default to `libvirt`.
- The two acceptable entries would be either `cloud-hypervisor` or `libvirt`.

The `virt-operator` can inform all other components about the hypervisor type
based on the information from the VMI spec. A different `virt-launcher` image
would be picked instead of the default one, so that it contains `ch-launcher`.

## Features

### Supported features

Here is a list of features expected to be available with Cloud Hypervisor.

#### Lifecycle

- Create a VM
- Start a VM
- Pause/Resume a VM
- Snapshot/Restore a VM
- Stop a VM

#### Virtual Hardware

- OVMF support (EFI)
- CPU topology
- CPU model is exclusively the equivalent of host for QEMU (no emulation of specific CPU model)
- RNG using virtio-rng
- Only headless VMs as we don't have graphics or video device emulation
- CPU constraints + hotplug
- Memory constraints + hotplug
- Hugepages

#### NUMA

- Host NUMA to select specific host CPUs and make sure memory is allocated on expected NUMA node
- Guest NUMA to expose any NUMA configuration to the guest

#### Disks and Volumes

- Disk support with virtio-block
- Filesystem support with virtio-fs
- Volume hotplug
- Offline snapshot
- Online snapshot (will come later with the guest agent support for freezing the filesystem)

sboeuf marked this conversation as resolved.
Show resolved Hide resolved
#### Network

- Support based on virtio-net or vhost-user-net
- Support for tap and macvtap

#### Host Device Assignment

- VFIO supported for passing through PCI devices

#### Accessing Virtual Machines

- Serial port (0x3f8) and virtio-console are supported
- Create a PTY so that external process can later connect to it
- Support for SSH as it directly depends on virtio-net support

#### Confidential Computing

- Support for SGX
- Experimental support for TDX

#### Architectures

- x86_64
- Aarch64

#### Migration

This should be supported eventually but it still requires some assessment of
how to achieve it. Therefore we might not see this feature being supported for
some time.

### Unsupported features

Here is the list of what will be missing compared to what libvirt/QEMU supports:

#### Virtual Hardware

- No CPU model emulation
- No way to pick a type of clock
- No way to pick a type of timer
- No support for emulated video and graphics devices
- No way to pick between different features like `acpi`, `apic`. We can select
`hyperv` though, which enables KVM Hyper-V enlightments
- No support for emulated input device

#### Disks and Volumes

- No support for cdrom, floppy disk or luns
- No support for resizable disk

#### Network

- No support for emulated NICs such as e1000, e1000e, ... (which means no SLIRP)

#### Accessing Virtual Machines

- No support for VNC

### Guest Agent

Features related to the ability of running a dedicated agent in the guest have
not been tested yet. The existing QEMU agent must be evaluated to see if it
could work and be reused directly with Cloud Hypervisor. If that's not the case,
an agent program would have to be developed for operations like `GuestPing`,
`ListInterfaces`, ...

sboeuf marked this conversation as resolved.
Show resolved Hide resolved
## Update/Rollback Compatibility

This new feature should not impact updates moving forward since it doesn't
remove anything.

## Functional Testing Approach

Create an additional CI entry to run Cloud Hypervisor dedicated testing. And of
course the set of tests that will be run would be a subset of what is already
available.

## Proof of Concept

As a reference, a PoC can be found through the following
[pull request](https://github.com/kubevirt/kubevirt/pull/8056).

It modifies the existing `virt-launcher` component so that it manages Cloud
Hypervisor VMs instead of libvirt ones.

It adds support for the following features:
- containerDisk
Since there's no support for compressed QCOW2 in Cloud Hypervisor,
I've simply converted the image to a RAW version. That means we
don't get the COW benefit but it works fine.
- emptyDisk
I've added a way to create a RAW image instead of QCOW2 since Cloud
Hypervisor doesn't support compressed QCOW2 images.
- cloudInitNoCloud
Pretty straightforward, I reused most of the code provided by the
repository
- Console
I had to run two extra go routines to redirect input/output between
the PTY device that is created by Cloud Hypervisor and the socket
located at /var/run/kubevirt-private/<pod-UID>/virt-serial0 that is
expected by virt-handler
- Network
Added support for both bridge and masquerade modes. This is done
through the existing code, with minimal changes as I used the
api.Domain reference that is being modified to retrieve both TAP
interface name and expected MAC address
- Kernel boot + initramfs
This is "supposedly" working but when I used vmi-kernel-boot example
I ended up running into some issues because the kernel binary
vmlinuz is not a PVH ELF header. I didnt' spend some time creating a
dedicated docker image containing the right type of kernel binary
but I expect this to work as long as the user provides a proper
image
- VM lifecycle
- Sync VMI creates and boots the VM based on the configuration that
has been generated from the VirtualMachineInstanceSpec. The support
for updating the VM and especially hotplugging devices hasn't been
implemented through this POC.
- Pausing and resuming the VM is supported through virtctl
- Stopping and deleting the VM is also supported through kubectl
delete.
- Lifecycle events
Listen to the events reported by Cloud Hypervisor through the
event-monitor socket, and transform them into domain events, setting
the appropriate status and reason for a state change

It has been tested with the following VMI examples:
- examples/vmi-fedora
- examples/vmi-masquerade

Note the Bazel workspace had to be updated so that the virt-launcher
container image would be generated with both `CLOUDHV.fd` firmware and
the Cloud Hypervisor binary.

# Implementation Phases

## Create `ch-launcher` binary

Create a minimal `ch-launcher` binary based off the `virt-launcher` one, just
enough to launch Cloud Hypervisor and connect to it, but with the domain manager
implementation providing empty shells.

## Create a new image

The first thing to do is to update the Bazel workspace to be able to generate a
new `ch-launcher` image dedicated for Cloud Hypervisor. This image should
contain what is needed to start a Cloud Hypervisor virtual machine, that is
the `cloud-hypervisor` binary pulled from the Cloud Hypervisor release, and the
associated OVMF firmware called `CLOUDHV.fd`. It must also contain the
`ch-launcher` binary instead of the `virt-launcher` one.

## Update VMI specification

Add a new field `Hypervisor` to the `VirtualMachineInstanceSpec` structure to
carry information about which hypervisor should be used.

Update all the components where it's assumed to always rely on `virt-launcher`
image so that it is dynamically chosen based on the `Hypervisor` value.

## Implement basic features

At this point we must extend the minimal `ch-launcher` implementation to end up
with a functional implementation so that some testing can be performed.

## Add a new CI worker

Define a new entry in the CI to perform the testing of KubeVirt with Cloud
Hypervisor. The amount of tests that can be run will be directly dependent on
the amount of features supported by this first version of `ch-launcher`.

## Enable new features one by one

At this point, it makes sense to submit one pull request per new feature that
we want to support as part of the Cloud Hypervisor integration effort.