pillar: Add support to CDI for native containers #4265

rene · 2024-09-17T15:07:43Z

The Container Device Interface (CDI) is already available on containerd and that are CDI files provided for NVIDIA platform that describes GPU devices for native container access. This commit introduces the support for CDI on pillar, so Video I/O adapters can pass a CDI device name to enable the direct access on native containers.

The corresponding documentation is also provided.

go mod tidy and go mod vendor Signed-off-by: Renê de Souza Pinto <[email protected]>

deitch · 2024-09-17T15:28:03Z

Two questions.

First, I had thought that the CDI was going to be in /opt/vendor/<vendor>/, so that the CDI file is in a known place, vendor-specific, and thus is already mounted into pillar? CDI files do not have to be in /etc/cdi/, it is configurable.

Second, I could not figure out where you consume the CDI files. Or are you assuming that by default, it will look in /etc/cdi/?

pkg/pillar/hypervisor/containerd.go

pkg/pillar/hypervisor/kvm.go

rene · 2024-09-17T16:37:33Z

Two questions.

First, I had thought that the CDI was going to be in /opt/vendor/<vendor>/, so that the CDI file is in a known place, vendor-specific, and thus is already mounted into pillar? CDI files do not have to be in /etc/cdi/, it is configurable.

We need to set the CDI directory in containerd's config file (https://github.com/lf-edge/eve/blob/master/pkg/dom0-ztools/rootfs/etc/containerd/config.toml) but directories like /opt/vendor/nvidia/cdi will not be available on all platforms, throwing an error when the directory doesn't exist
By using /etc/cdi I can push any CDI file direct to this directory without the needs to change /etc/containerd/config.toml which is part of pkg/dom0-ztools
In my tests, configuring another directory other than /etc/cdi in the config.toml didn't work, I didn't have time to debug more deeply, once I figured out, maybe the custom CDI directory can be implemented....

Second, I could not figure out where you consume the CDI files. Or are you assuming that by default, it will look in /etc/cdi/?

Here: https://github.com/lf-edge/eve/pull/4265/files#diff-b4bffe1f639681bb5f53d8804f2010200cc30b19a149577433ae128d00e03175R181

I just need to inform the device string, containerd's CDI plugin does the whole job to look for the CDI files (in the directory configured in config.toml), consume them and populate the OCI spec...

deitch · 2024-09-17T16:53:15Z

containerd's config file (https://github.com/lf-edge/eve/blob/master/pkg/dom0-ztools/rootfs/etc/containerd/config.toml)

Isn't that system containerd, and we want this for user containerd? Although I spent time with Paul earlier today that showed that user apps are running on system containerd? I lost track. 🤷‍♂️

By using /etc/cdi I can push any CDI file direct to this directory without the needs to change /etc/containerd/config.toml which is part of pkg/dom0-ztools

So what would be the process for a specific vendor? Root filesystem is immutable, so /etc/cdi (which is mounted into the pillar container) should be immutable. How do I get my CDI files in there? Or is the assumption that they are part of the rootfs build? How would that work?

We do have the pillar /opt/vendor/*/init.sh process, where we could just link /etc/cdi -> /opt/vendor/<foo>/cdi/? I am not sure that is better, though. I always have a concern with modifying things on the root filesystem and mounting it in.

In my tests, configuring another directory other than /etc/cdi in the config.toml didn't work

Different containerd? 🤷‍♂️

rene · 2024-09-18T10:30:47Z

containerd's config file (https://github.com/lf-edge/eve/blob/master/pkg/dom0-ztools/rootfs/etc/containerd/config.toml)

Isn't that system containerd, and we want this for user containerd? Although I spent time with @christoph-zededa earlier today that showed that user apps are running on system containerd? I lost track. 🤷‍♂️

No, user containerd is used only for CAS, user apps run on system's containerd....

By using /etc/cdi I can push any CDI file direct to this directory without the needs to change /etc/containerd/config.toml which is part of pkg/dom0-ztools

So what would be the process for a specific vendor? Root filesystem is immutable, so /etc/cdi (which is mounted into the pillar container) should be immutable. How do I get my CDI files in there? Or is the assumption that they are part of the rootfs build? How would that work?

We do have the pillar /opt/vendor/*/init.sh process, where we could just link /etc/cdi -> /opt/vendor/<foo>/cdi/? I am not sure that is better, though. I always have a concern with modifying things on the root filesystem and mounting it in.

Me too, I have a security concern about allow runtime CDI files, so the assumption is that any CDI file must be part of rootfs build, which, IMO makes sense since they are usually very hardware specific and will be provided by specific packages, such as the pkg/nvidia....

In my tests, configuring another directory other than /etc/cdi in the config.toml didn't work

Different containerd? 🤷‍♂️

Unfortunately not, the containerd config is correct, it's the system's containerd....

The Container Device Interface (CDI) is already available on containerd and that are CDI files provided for NVIDIA platform that describes GPU devices for native container access. This commit introduces the support for CDI on pillar, so Video I/O adapters can pass a CDI device name to enable the direct access on native containers. Signed-off-by: Renê de Souza Pinto <[email protected]>

deitch · 2024-09-18T11:28:25Z

No, user containerd is used only for CAS, user apps run on system's containerd

Yeah, that came up yesterday. We probably should change that, but well beyond the scope of this PR.

Me too, I have a security concern about allow runtime CDI files, so the assumption is that any CDI file must be part of rootfs build, which, IMO makes sense since they are usually very hardware specific and will be provided by specific packages, such as the pkg/nvidia....

What happens if you have 100 different devices, all of the same family, with slightly different CDI? Are you going to have 100 different rootfs builds? Or just one, with multiple CDIs, and the ability to detect each? This gets us very much down the road of different builds because of a single few KB config file in /etc/.

Unfortunately not, the containerd config is correct, it's the system's containerd

So, we mount /etc/cdi into pillar, just so that we can retrieve the devices and modify the container spec, but in the end that gets passed to system containerd anyways, which runs (by definition) outside of pillar?

I didn't quite get what we are doing with that chunk of code inside pillar. We inject the devices into the container spec based on the name. Essentially, we are duplicating what containerd normally does?

rene · 2024-09-18T12:03:24Z

What happens if you have 100 different devices, all of the same family, with slightly different CDI? Are you going to have 100 different rootfs builds? Or just one, with multiple CDIs, and the ability to detect each? This gets us very much down the road of different builds because of a single few KB config file in /etc/.

It's totally fine to have multiple CDI files under /etc/cdi, we don't need a single build per device. Actually, that's how it's working for NVIDIA, we have CDI files for both Xavier + Orin boards on the same build. Inside each file, we use different names for device description, so we have "nvidia.com/xavier-gpu" for Xavier and "nvidia.com/orin-gpu" for Orin...

Unfortunately not, the containerd config is correct, it's the system's containerd

So, we mount /etc/cdi into pillar, just so that we can retrieve the devices and modify the container spec, but in the end that gets passed to system containerd anyways, which runs (by definition) outside of pillar?

Yes, this can be improved when we move execution of Edge Apps to user containerd...

I didn't quite get what we are doing with that chunk of code inside pillar. We inject the devices into the container spec based on the name. Essentially, we are duplicating what containerd normally does?

No, this parses the I/O adapters list from the device model. The way we give direct access to GPU is exactly the same way as we use for passthrough PCI devices, the difference is that instead of giving a PCIe Bus Address in the device model, we pass the CDI string for the particular device. The CDI is only used for GPU access for now, for Serial devices and any other device (like a webcam under /dev/video0) we read them from the device model and add them to the OCI spec. For standard containers (with ShimVM) we just give full access to all file devices. In the documentation being added you can see some examples and a better description: https://github.com/lf-edge/eve/pull/4265/files#diff-8230cff5878b3df207474c79828836840673a5ac49fdc808ba034809902cac96

deitch · 2024-09-18T12:50:38Z

It's totally fine to have multiple CDI files under /etc/cdi, we don't need a single build per device. Actually, that's how it's working for NVIDIA, we have CDI files for both Xavier + Orin boards on the same build. Inside each file, we use different names for device description, so we have "nvidia.com/xavier-gpu" for Xavier and "nvidia.com/orin-gpu" for Orin...

OK, that works, thanks.

No, this parses the I/O adapters list from the device model. The way we give direct access to GPU is exactly the same way as we use for passthrough PCI devices, the difference is that instead of giving a PCIe Bus Address in the device model, we pass the CDI string for the particular device. The CDI is only used for GPU access for now, for Serial devices and any other device (like a webcam under /dev/video0) we read them from the device model and add them to the OCI spec. For standard containers (with ShimVM) we just give full access to all file devices. In the documentation being added you can see some examples and a better descriptio

So this is about translating between what came in the EVE API request for devices to pass to the app, and the CDI file format, so that it will know what to do with it? That would make sense to me, but the doc to which you linked implies that the spec comes with the CDI attribute? So why translate?

rene · 2024-09-18T13:06:11Z

So this is about translating between what came in the EVE API request for devices to pass to the app, and the CDI file format, so that it will know what to do with it? That would make sense to me, but the doc to which you linked implies that the spec comes with the CDI attribute? So why translate?

I don't know if I understood your question, but the main idea is that the CDI string works as a "hardware ID", the same way we specify PCIe Bus Address, network interface names (eth0, wlan0), etc, in the hardware device model, for this particular case we specify the CDI string which points to the device described in the CDI file.... this makes the "native container GPU passthrough" work transparently with the controller, so the user can "pass-through" a GPU for a native container in the same way it pass-through a PCI Video card on x86, for example....

OhmSpectator · 2024-09-18T13:14:02Z

docs/HARDWARE-MODEL.md

+}
+```
+
+If two containers need to access the same GPU (when supported), two I/O adapters must be defined. For instance, two


So, the number of these files depends on how many App Instances will use them, am I right? It means: dynamic? In this case, is it ok to store the file in the RO partition?

I don't think so, but will let Rene confirm. You have one or more CDI yaml files that describe the devices on the device and how to access them (including devices, and mounts necessary and all sorts of instruction stuff). Whether it is 1 or 50 apps, only the defined CDI files for the device are used. They are read-only.

Exactly @deitch ... @OhmSpectator , we must have one GPU I/O adapter per Edge App because an I/O adapter can be "passed-through" to only one VM, makes total sense for PCI devices, but we do have some exceptions, like CAN interfaces (that can be accessed by multiple VMs at the same time) and now this use case, where a single GPU can be shared across multiple containers... but this is only about device model, there is nothing to do with CDI files.....

deitch · 2024-09-18T14:01:38Z

Actually, that's how it's working for NVIDIA, we have CDI files for both Xavier + Orin boards on the same build. Inside each file, we use different names for device description, so we have "nvidia.com/xavier-gpu" for Xavier and "nvidia.com/orin-gpu" for Orin..

I was just thinking about this. How do you distinguish between them? Aren't the device names similar? Won't you have conflicts?

deitch · 2024-09-18T14:02:42Z

I don't know if I understood your question, but the main idea is that the CDI string works as a "hardware ID", the same way we specify PCIe Bus Address, network interface names (eth0, wlan0), etc, in the hardware device model, for this particular case we specify the CDI string which points to the device described in the CDI file.... this makes the "native container GPU passthrough" work transparently with the controller, so the user can "pass-through" a GPU for a native container in the same way it pass-through a PCI Video card on x86, for example

What I meant was, if we already define the CDI string inside the app instance, e.g. nvidia.com/gpus=0, then that is how containerd will take it and build the right config.json OCI spec for it, so why does pillar need to have access to it? What is that chunk of code doing?

rene · 2024-09-18T14:42:08Z

Actually, that's how it's working for NVIDIA, we have CDI files for both Xavier + Orin boards on the same build. Inside each file, we use different names for device description, so we have "nvidia.com/xavier-gpu" for Xavier and "nvidia.com/orin-gpu" for Orin..

I was just thinking about this. How do you distinguish between them? Aren't the device names similar? Won't you have conflicts?

We can naming these devices whatever we want, originally nvidia-ctk will always use "nvidia.com/gpu" during the CDI generation, I just changed them to nvidia.com/xavier-gpu and nvidia.com/orin-gpu

rene · 2024-09-18T14:47:13Z

I don't know if I understood your question, but the main idea is that the CDI string works as a "hardware ID", the same way we specify PCIe Bus Address, network interface names (eth0, wlan0), etc, in the hardware device model, for this particular case we specify the CDI string which points to the device described in the CDI file.... this makes the "native container GPU passthrough" work transparently with the controller, so the user can "pass-through" a GPU for a native container in the same way it pass-through a PCI Video card on x86, for example

What I meant was, if we already define the CDI string inside the app instance, e.g. nvidia.com/gpus=0, then that is how containerd will take it and build the right config.json OCI spec for it, so why does pillar need to have access to it? What is that chunk of code doing?

We don't define CDI string inside the Edge App, the Edge App should be as any regular Edge App, the only requirement is to have NO_HYPER as the virtualization mode. Then, we are going to passthrough a GPU I/O adapter to this Edge App, as any regular PCI passthrough... the trick happens when we parse the I/O adapters from the device model and found the "cdi" attribute under "cbattr", so for native containers (and only for native containers) we will use this string as a CDI device and processing accordingly....

This approach makes the CDI solution 100% compatible with the current passthrough mechanism and it requires no changes neither in the API nor in the controller side....

deitch · 2024-09-18T14:50:55Z

Ok, now that makes sense. So there still is a "translation" going on between "how GPU appears in EVE API" and "how GPU is listed in CDI files". The work in pillar is there to do that translation. Correct? Can we capture that in the docs?

rene · 2024-09-18T15:08:28Z

Ok, now that makes sense. So there still is a "translation" going on between "how GPU appears in EVE API" and "how GPU is listed in CDI files". The work in pillar is there to do that translation. Correct? Can we capture that in the docs?

Correct. Ok, I will update the documentation...

Add proper documentation explaning about the CDI support for bare metal containers. Signed-off-by: Renê de Souza Pinto <[email protected]>

rene · 2024-09-18T15:37:03Z

Updates in this PR:

Documentation updated.

pillar: Update go modules

c5de6b5

go mod tidy and go mod vendor Signed-off-by: Renê de Souza Pinto <[email protected]>

rene requested a review from deitch September 17, 2024 15:07

rene requested review from OhmSpectator, rouming and eriknordmark as code owners September 17, 2024 15:07

github-actions bot requested review from rucoder and shjala September 17, 2024 15:08

christoph-zededa reviewed Sep 17, 2024

View reviewed changes

pkg/pillar/hypervisor/containerd.go Outdated Show resolved Hide resolved

christoph-zededa reviewed Sep 17, 2024

View reviewed changes

pkg/pillar/hypervisor/kvm.go Outdated Show resolved Hide resolved

rene force-pushed the introduce-pillar-cdi branch from bc9c930 to 391d4b9 Compare September 17, 2024 16:39

rene force-pushed the introduce-pillar-cdi branch from 391d4b9 to e99de15 Compare September 18, 2024 10:49

OhmSpectator reviewed Sep 18, 2024

View reviewed changes

docs: Add documentation about CDI support

d275fb1

Add proper documentation explaning about the CDI support for bare metal containers. Signed-off-by: Renê de Souza Pinto <[email protected]>

rene force-pushed the introduce-pillar-cdi branch from e99de15 to d275fb1 Compare September 18, 2024 15:32

github-actions bot requested a review from OhmSpectator September 18, 2024 15:33

deitch approved these changes Sep 18, 2024

View reviewed changes

OhmSpectator merged commit 215fcb3 into lf-edge:master Sep 19, 2024
38 checks passed

rene deleted the introduce-pillar-cdi branch October 11, 2024 10:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pillar: Add support to CDI for native containers #4265

pillar: Add support to CDI for native containers #4265

rene commented Sep 17, 2024

deitch commented Sep 17, 2024

rene commented Sep 17, 2024

deitch commented Sep 17, 2024 •

edited

Loading

rene commented Sep 18, 2024

deitch commented Sep 18, 2024

rene commented Sep 18, 2024

deitch commented Sep 18, 2024

rene commented Sep 18, 2024

OhmSpectator Sep 18, 2024

deitch Sep 18, 2024

rene Sep 18, 2024

deitch commented Sep 18, 2024

deitch commented Sep 18, 2024

rene commented Sep 18, 2024 •

edited

Loading

rene commented Sep 18, 2024

deitch commented Sep 18, 2024

rene commented Sep 18, 2024

rene commented Sep 18, 2024

pillar: Add support to CDI for native containers #4265

pillar: Add support to CDI for native containers #4265

Conversation

rene commented Sep 17, 2024

deitch commented Sep 17, 2024

rene commented Sep 17, 2024

deitch commented Sep 17, 2024 • edited Loading

rene commented Sep 18, 2024

deitch commented Sep 18, 2024

rene commented Sep 18, 2024

deitch commented Sep 18, 2024

rene commented Sep 18, 2024

OhmSpectator Sep 18, 2024

Choose a reason for hiding this comment

deitch Sep 18, 2024

Choose a reason for hiding this comment

rene Sep 18, 2024

Choose a reason for hiding this comment

deitch commented Sep 18, 2024

deitch commented Sep 18, 2024

rene commented Sep 18, 2024 • edited Loading

rene commented Sep 18, 2024

deitch commented Sep 18, 2024

rene commented Sep 18, 2024

rene commented Sep 18, 2024

deitch commented Sep 17, 2024 •

edited

Loading

rene commented Sep 18, 2024 •

edited

Loading