-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFD 121 bhyve brand: Discussion #76
Comments
For the KVM brand (at least last I played with them), every device configured via zonecfg will get passed as a disk. If possible the same approach should be avoided for bhyve, this to keep the option for device pass-through open in the future. (e.g. it is impossible to pass-through a tty to a to a kvm branded zone, this works fine on OmniOS)
To keep the option of pass through open in the future, this might be needed. E.g. pci device usually need to be attached at the same spot to work properly. |
Oh and you might want to reach out to Allen Jude, if I am not mistaken he revived a review for some code to have bhyve parse a simple config file. Maybe the zbhyve bits can benifit from that. Or the other way around. |
VirtFS/9p filesystem passthrough (https://reviews.freebsd.org/D10335) might enable support for zonecfg "fs". |
Sorry for posting something not directly related to the RFD.
Live resize might already be on the list anyways, it seems to be feature mostly missed by plain SmartOS users. |
@jussisallinen I think you are looking for RFD 26. |
New predraft posted, with @yourname next to things that are in response to various people's comments.
|
@mgerdts you added a note for @wiedi god-passhrough to the fs-allowed property. Shouldn’t that go in the fs property instead? IIRC fs-allowed would be for telling a zone what file system type it is allowed to mount. But more clarification on the try pass through, i was talking specifically about cua* devices, e.g. for a serial UPS or GPS device. When running KVM outside a KVM branded zone you can hook up one of qemu/kvm’s device to those. That does not work in kvm branded zones due to all device zone properties being mapped to a disk. Sorry if that was not clear before. |
@sjorge Thanks for noticing that I only updated fs-allowed. I've now updated fs as well. As for the tty devices, I realize I chose the wrong term. I've changed that to serial to better convey the intent. |
@mgerdts another thing, maybe make amd_hostbridge vs hostbridge (intel) selectable!
Also if/once we export the vnc console, an option to set the wait flag would be great to expose
|
Thanks for doing the initial write-up, Mike. (And thanks to everyone else for the feedback) Some comments from my first pass:
We're expecting the bhyve
Any reason to use
For in-zone bhyve, we'll need to wire up an explicit interface so that
I am working on plumbing for this that will resemble the interface used by
The viona bits are using MAC directly, so only the libraries should be required, I think?
Is there a reason why we can't call the
Is this the appropriate place to mention adding a privilege to the system which is required to instantiate |
Only DHCPv4? Or also SLAAC/DHCPv6? |
The first draft will only be DHCPv4 to support the "static" addressing model expected from existing KVM images to make it easier to test with them. I think we'll probably want a more explicit scheme for communicating static addresses (both v4 and v6) into the guest, rather than snooping for DHCP requests. |
On Fri, Jan 12, 2018 at 3:36 PM, Patrick Mooney ***@***.***> wrote:
Thanks for doing the initial write-up, Mike. (And thanks to everyone else
for the feedback)
Thanks for reviewing.
Some comments from my first pass:
/usr/sbin/amd64/bhyve -m 4g -c 2 -l com1,/dev/zconsole -P -H -s 1,lpc
-s 3,virtio-blk,/dev/zvol/rdsk/zones/$zone/data/disk0
-s 4,virtio-net-viona,net0
-l bootrom,/BHYVE_UEFI.fd "$zone"
We're expecting the bhyve vmm component to be sdev-aware, right? Perhaps
we
can stick to a common name for each instance (VM?) since they'll all be
effectively namespaced into the zone. (So it'd be like
/dev/vmm/<zonename>/VM from the GZ, /dev/vmm/VM from the zone)
In the zone, /dev/vmm will only show the nodes that this zone has access to
and eventually there will be a /dev/vmm/zone directory per zone. I thought
about using the same name for all vm's in a bhyve zone, using the zone
namespace to uniquely identify them. I can't say I care much which way we
go. I've punted on this so far because I've not yet added the zone
namespace into /dev/vmm.
The /dev/viona/ctl node is opened and a CREATE ioctl is issued. This
creates a new minor that does not require a minor node. The return value
from the ioctl is a file descriptor associated with the new viona minor.
Any reason to use /dev/viona/ctl over plain /dev/viona? I don't believe
any piece of the viona interface is going to need (or frankly want) access
to already-opened instances for other processes.
Only because the sdev_plugin seems to not work in the "just create a device
without a directory" mode. Once sdev issues are sorted, we can drop the
directory.
When the bhyve command exits, the kernel state remains present until a
DESTROY ioctl is issued. To free these resources, vmctl --destroy must be
used.
For in-zone bhyve, we'll need to wire up an explicit interface so that
lingering vmm resources are destroyed on zone shutdown.
Yeah, I'll add a callback to to the zone to do the destroy in-kernel. The
vmm instance will maintain a hold on the zone to be sure that we don't get
zombies.
The following are private implementation details that are architecturally
relevant.
Guest networking configuration
XXX @pfmooney <https://github.com/pfmooney> needs to review this
I am working on plumbing for this that will resemble the interface used by
qemu/KVM today: Where the userspace hypervisor component can be configured
to
respond to DHCP requests using addressing information provided at startup.
My plan is for those addressing parameters to be passed to the bhyve(1)
commandline as parameters for the viona driver. It will handle all the
details of filtering and injecting the packets in as efficient a manner as
possible.
XXX It is not yet known if the following are needed:
vnd(8d)
The viona bits are using MAC directly, so only the libraries should be
required, I think?
We are striving to not modify bhyve code any more than required so that it
is easier to keep in sync with upstream. For this reason, a new source
file, zhyve.c is being added. This will contain an implementation of main()
and any other bhyve brand-specific code that is required. The main() that
is in bhyverun.c is renamed to bhyve_main() via -Dmain=bhyve_main in
CPPFLAGS while compiling bhyverun.c
In the global zone, /usr/sbin/amd64/bhyve and
/usr/lib/brand/bhyve/usr/sbin/zhyve will be hard links to the same file.
When invoked with a basename of bhyve, the command will behave exactly as
documented in bhyve(8). When invoked with a basename of zhyve, it will read
its arguments from /var/run/bhyve/zhyve.args
Is there a reason why we can't call the bhyve binary from zhyve rather
than
the name-detection route? I suspect a separate handler process (zhyve as
init)
would be desirable for doing guest restarts.
I was trying to make it so that the zone doesn't need proc_exec. The zone
platform has a restart_init option that we could set to true for guest
restarts where we want to reattach to the already allocated memory.
The privileges will be stripped to the minimum required to run a guest. If
bhyve only needs a privilege during startup, the privilege will be dropped
prior to running code in the guest.
Is this the appropriate place to mention adding a privilege to the system
which is required to instantiate vmm instances?
I thought there may be a separate rfd covering bhyve as a standalone
component. If that exists, the introductory material at the top of this
becomes a reference to the other rfd. That rfd would presumably cover the
vmm and viona dirvers. It should probably also cover their sdev plugins.
Just to be clear, bhyve can run in a zone without the bhyve brand. The
bhyve brand is really about making it so that you don't have to think about
how bhyve works and how it interacts with illumos features like privileges
and resource controls. The bhyve brand should provide a very simple way to
get the best practice for performance and security with minimal effort.
|
I'm not sure the same sdev plugin integration is needed for viona. Something similar to what was done for the KVM driver may be adequate.
If we end up wanting/needing the |
An option to specify a bootrom in the zonecfg might also be nice. e.g. https://www.freshports.org/sysutils/uefi-edk2-bhyve-csm/ Maybe also and option to pass user specified flags to the bhyve process. |
Something else that popped in my head while I was doing some stuff with my kvm zone earlier today...
Anyway, can we maybe get bhyve brand to just work with zlogin -C ? |
The plan is to have The updates to the RFD will come soon. |
To expand on this further, I suspect that we will want to do with bhyve what we started doing with KVM images (only Ubuntu so far I believe): fetch the information using |
@sjorge - The ability to select the bootrom is being added as well. Is there a need to select arbitrary bootroms or is the selection of |
Selecting beteren UEFI and UEFI_CSM will be sufficient I think. In don’t think I have seen any other bootroms available. |
Absolutely. The DHCP addressing feature is only meant as a stopgap to allow KVM images to be used until better bhyve-specific ones (which do static addressing on their own) are created. Fortunately, the design for intercepting those DHCP/ARP requests in viona should result in lower overhead than what's used for KVM. |
(Slightly OT) The idea is to grap the RARP/DHCP bits when they pass through viona. Instead of sending them to the network stack, bhyve (or some other daemon) handles them and sends a reply? |
I've updated the draft. Please take a fresh look. https://github.com/joyent/rfd/blob/master/rfd/0121/README.md |
@mgerdts I left a few comments on the actual commit, that seemed easier than copy/pasting here to add comments. |
For zoneadm install -i, can you consider 2 additional formats:
Thanks! |
Ideally The process of Zone root - as I read it a bhyve zone will need to be effectively a sparse zone and
With the recent zone-specific-data integration, isn't there now a limit of one bhyve per NGZ? |
@ptribble Container images (docker or ACI, assuming I found the right ACI) are great for containers, but problematic for virtual machines. In order to create a VM, we would need to run a compatible OS to perform the installation. It seems like the container images could be installed over the top of some base image (which includes the required kernel, boot loader, etc.) It seems like what you are after is better implemented with custom media that does the right thing for the OS that is being installed or with post-installation orchestration. If you disagree, please share a rough outline of how you think how any illumos derived installation would be able to reliably install an arbitrary container image that is probably assumes a non-illumos kernel. In particular, how are appropriate partition tables, file systems, boot loaders, and kernels put in place? |
@citrus-it allowed-address is one that I waffled on for a while. The way that smartos does this now is it uses something along the lines of Maybe what makes sense is to allow Note that this would probably not make the first cut refactoring that goes into smartos-joyent, but seems reasonable to implement before upstreaming to illumos. |
@mgerdts Hm, I guess that I'm getting too close to direct install. I suspect this is something I would have to experiment with to see what does and doesn't work, rather than trying to design it up front. |
@mgerdts thanks for the reply. OmniOS certainly has protection settable with:
but the automatic setting done by |
@citrus-it I agree that the environment variables followed by boot dumping stuff into the zone is hacky. I'm considering alternatives. One in particular mimics what we did with kernel zones in Solaris. We modified zoneadmd so that it has brand-specific handlers that run in
FWIW, I skipped the part where logging is set up, ala OS-6718. There are other ways to deal with this too. I'm admittedly biased toward what we did with kernel zones, as that proved to be quite handy as we added suspend/resume, live reconfiguration, live migration, etc. Not only that, the in-process brand hooks made it so that a ton of stuff that is in ksh in other brands was able to be implemented in C (often reusing existing C code). I've posted a very incomplete working copy of a future RFD. |
@citrus-it thanks for that - I wasn't aware of |
The current |
@mgerdts was going over the RFD again, it is unclear to me how (or if) it is possible to pick amd_hostbridge vs hostbridge for slot 0. (e.g. some guest only support one of the two, example OpenBSD needs amd_hostbridge) |
@sjorge good point. I think that is probably best done with |
Another typo snuk in |
If it works, can we have the option for specifying "pci_slot": "x:y:z" to a VNIC definition inside "nics" when creating a Bhyve VM, so we can have more than 8 functions assigned to virtual NICs? |
There is already a ticket for this work: https://smartos.org/bugview/OS-7458 |
Thanks!, sorry for not seeing it. |
This issue represents an opportunity for discussion of RFD 121 while it remains in a pre-published state.
The text was updated successfully, but these errors were encountered: