Skip to content

Commit

Permalink
Add Resource Driver concept
Browse files Browse the repository at this point in the history
Co-authored-by: Michael <[email protected]>
Co-authored-by: Evan Lezar <[email protected]>
Co-authored-by: Tim Bannister <[email protected]>
Co-authored-by: Eero Tamminen <[email protected]>
Co-authored-by: Patrick Ohly <[email protected]>
Co-authored-by: Dipesh Rawat <[email protected]>
  • Loading branch information
7 people committed Feb 14, 2024
1 parent a1b2385 commit b3c0599
Show file tree
Hide file tree
Showing 12 changed files with 441 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,9 @@ fabric that links Pods together.
Kubernetes {{< skew currentVersion >}} is compatible with {{< glossary_tooltip text="CNI" term_id="cni" >}}
network plugins.

* [Resource drivers](/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers/)

Resource drivers allow custom allocation logic for non-native cluster resources that are
difficult to represent with scalar values. They offload from the scheduler the burden of
understanding these resources and planning their usage through ResourceClaims by Pods.

Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
---
title: DRA Resource Drivers
description: Resource drivers provide non-trivial allocation logic and management for devices or resources that require vendor-specific or just complex setup, such as GPUs, NICs, FPGAs, etc.
content_type: concept
weight: 10
---

<!-- overview -->
{{< feature-state for_k8s_version="v1.27" state="alpha" >}}

Kubernetes provides a
[Dynamic Resource Allocation](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/) (DRA)
mechanism, that can be leveraged to provide more complex hardware resources to workloads with custom
resource accounting.

Similarly to {{< glossary_tooltip term_id="device-plugin" text="device plugins">}}, instead of
customizing the code for Kubernetes itself, vendors can implement a _resource driver_ that you deploy
into the cluster to account for and control the allocation of GPUs, high-performance NICs, FPGAs,
InfiniBand adapters, and other similar computing resources that may require vendor specific
initialization and setup.

With device plugins, the scheduler was given a trivial, numerical representation of the resources
available on a node for consideration during scheduling as an extended resource.

With DRA, the scheduler is offloading the task of allocating and accounting for non-native resources
to the resource driver, which manages such resources in the cluster.

A resource driver consists of two main components:

- a _controller_ (one per cluster), manages hardware resources allocation for
{{< glossary_tooltip term_id="ResourceClaim" text="ResourceClaims">}}
- _kubelet plugin_ (one per node that has or can access the associated resource), that:
- discovers the supported hardware
- announces the discovered hardware to the resource driver controller
- prepares the hardware allocated to a ResourceClaim when the Kubelet prepares to create the Pod
- unprepares the hardware allocated for a Pod when the Pod has reached final state or is being deleted.

There are two common ways of communication between the controller and a kubelet plugin:

- through custom resource objects that use
{{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinitions">}}
provided by the vendor or project behind the resource driver
- through a ResourceHandle which is a part of an `AllocationResult` provided by the controller in case of
successful allocation

General recommendations:

- resource driver name pattern: `<HW type>.resource.<companyname>.<companydomain>`. For example,
gpu.resource.example.com

<!-- body -->

## Resource driver controller

A resource driver controller's main responsibility is to allocate and deallocate resources for
{{< glossary_tooltip term_id="ResourceClaim" text="ResourceClaims">}}.

There are two modes of allocation the ResourceClaim can have:

- `WaitForFirstConsumer` (default), which you could think of as meaning _delayed_.
In this mode the cluster only requests resource(s) for ResourceClaim when a Pod that needs it is being scheduled.
- `Immediate`: the resource has to be allocated to the ResourceClaim as soon as possible, and
retained until the ResourceClaim is deleted.

### Delayed allocation

Controller helper code will first call _UnsuitableNodes_ for driver to report which of candidate Nodes
chosen by the scheduler are not suitable for allocating all needed ResourceClaims of this resource driver.
If no nodes were suitable, scheduler selects another batch of Node names, and _UnsuitableNodes_ is
called again until suitable node is found.

When at least one Node is found to be suitable for all ResourceClaims, the scheduler
considers the suitable nodes for the other Pod scheduling constraints (native resources
requests, affinity, selectors, etc.), and picks up exactly one Node name.

After the Node was selected, the controller helper code will invoke _Allocate_ call of the Driver
to do the actual resource allocation for needed ResourceClaims on selected Node.

If Allocate call returns error for any number of ResourceClaims, the helper code will repeat the
same call with interval until it succeeds.

### Immediate allocation

Immediate allocation does not have selected node, and it is up to the resource driver controller
to select the best suitable node based on the ResourceClaim, ResourceClass and their parameters.
Therefore in this scenario only `Allocate` is called by the helper library, without `UnsupportedNodes`
being called first.

### Common calls for both allocation modes

In both modes the allocation is preceded by getting parameters objects for ResourceClaims and
ResourceClasses to ensure the resource driver is able to get these objects and understand them.

## Sharing resources

There are two main ways of sharing resources between Pods:
- by using the same ResourceClaim in multiple Pods
- by using the same underlying resource for different ResourceClaims

### Shared ResourceClaims

If the `Shareable` field is set to `true` in AllocationResult for ResourceClaim, scheduler will
allow the same ResourceClaim to be used by up to 32 Pods by automatically updating
`Claim.Status.ReservedFor` field without consulting the resource driver that allocated resource
for this ResourceClaim.

### Internal accounting in resource driver

The other way of sharing same resource is by implementing the sharing logic in the resource driver.
This can be based on, for instance, ResourceClass parameters field that would specify whether the
resource driver should exclusively allocate the resource to the ResourceClaim, or same resource
can be allocated to other ResourceClaims.

### Example {#example-pod}

Suppose a Kubernetes cluster is running a resource driver gpu.resource.example.com with Resource
Class `example.example.com`. Here is an example of a pod requesting this resource to run a demo
workload:

```yaml
# gpu.resource.example.com GpuClaimParameters is an example extension API for parameters
apiVersion: gpu.resource.example.com/v1alpha1
kind: GpuClaimParameters
metadata:
name: single-gpu
spec:
count: 1
---
apiVersion: resource.k8s.io/v1alpha2
kind: ResourceClaim
metadata:
name: gpu-test
spec:
resourceClassName: gpu.example.com
parametersRef:
apiGroup: gpu.resource.example.com/v1alpha1
kind: GpuClaimParameters
name: single-gpu
---
apiVersion: v1
kind: Pod
metadata:
namespace: gpu-test4
name: pod0
labels:
app: pod
spec:
containers:
- name: container1
image: ubuntu:22.04
command: ["bash", "-c"]
args: ["export; sleep 9999"]
resources:
claims:
- name: gpus
resourceClaims:
- name: gpus
source:
resourceClaimTemplateName: gpu-test
# This Pod wants to use ResourceClaim gpu-test that needs 1 device of ResourceClass
# gpu.example.com, handled by the gpu.resource.example.com resource driver.
#
# The resource driver allocates the resources required for that ResourceClaim and ensures that these are
# ready to use, only then the Pod will start.
```

## Good practice for resource driver deployment {#resource-driver-deploy-tips}

The recommended way to deploy a resource driver is a Deployment for controller part and a DaemonSet
for the kubelet plugin part. It is also possible to deploy it as a package for your node's
operating system, or manually.

The kubelet uses a gRPC interface to interact with a resource driver's kubelet plugin. On the Kubernetes side,
no special permissions are required for resource drivers.

When you deploy a resource driver, you typically also define at least one ResourceClass using that driver.

## API compatibility

Kubernetes Dynamic Resource Allocation support is in alpha. The API may change before stabilization,
in incompatible ways. As a project, Kubernetes recommends that resource driver developers:

* Watch for changes in future releases.
* Support multiple versions of the resource driver API for backward/forward compatibility.

If you enable the `DynamicResourceAllocation` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) and run associated kubelet plugins on nodes
that need to be upgraded to a Kubernetes release with a newer DRA API version, upgrade your
resource drivers to support both versions before upgrading these nodes. Taking that approach will
ensure the continuous functioning of the device allocations during the upgrade.

## DRA resource driver examples {#examples}

{{% thirdparty-content %}}

Here are some examples of resource driver implementations:

* The [example resource driver](https://github.com/kubernetes-sigs/dra-example-driver)
* The [Intel GPU resource driver](https://github.com/intel/intel-resource-drivers-for-kubernetes)
* The [NVIDIA GPU resource driver](https://github.com/NVIDIA/k8s-dra-driver)


## {{% heading "whatsnext" %}}

* Learn about [creating your own DRA resource driver](https://www.youtube.com/watch?v=_fi9asserLE)
* Discover the [example DRA resource driver](https://github.com/kubernetes-sigs/dra-example-driver)
10 changes: 8 additions & 2 deletions content/en/docs/reference/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ client libraries:
a set of back-ends.
* [kube-scheduler](/docs/reference/command-line-tools-reference/kube-scheduler/) -
Scheduler that manages availability, performance, and capacity.

* [Scheduler Policies](/docs/reference/scheduling/policies)
* [Scheduler Profiles](/docs/reference/scheduling/config#profiles)

Expand Down Expand Up @@ -92,7 +92,7 @@ operator to use or manage a cluster.
* [kube-controller-manager configuration (v1alpha1)](/docs/reference/config-api/kube-controller-manager-config.v1alpha1/)
* [kube-proxy configuration (v1alpha1)](/docs/reference/config-api/kube-proxy-config.v1alpha1/)
* [`audit.k8s.io/v1` API](/docs/reference/config-api/apiserver-audit.v1/)
* [Client authentication API (v1beta1)](/docs/reference/config-api/client-authentication.v1beta1/) and
* [Client authentication API (v1beta1)](/docs/reference/config-api/client-authentication.v1beta1/) and
[Client authentication API (v1)](/docs/reference/config-api/client-authentication.v1/)
* [WebhookAdmission configuration (v1)](/docs/reference/config-api/apiserver-webhookadmission.v1/)
* [ImagePolicy API (v1alpha1)](/docs/reference/config-api/imagepolicy.v1alpha1/)
Expand All @@ -117,3 +117,9 @@ An archive of the design docs for Kubernetes functionality. Good starting points
[Kubernetes Architecture](https://git.k8s.io/design-proposals-archive/architecture/architecture.md) and
[Kubernetes Design Overview](https://git.k8s.io/design-proposals-archive).

## Helper libraries

### Dynamic resource allocation

[Resource driver controller](/docs/reference/helper-libraries/dra-driver-controller/)
[Resource driver kubelet plugin](/docs/reference/helper-libraries/dra-driver-kubelet-plugin/)
23 changes: 23 additions & 0 deletions content/en/docs/reference/glossary/PodSchedulingContext.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
id: PodSchedulingContext
title: PodSchedulingContext
full-link: /docs/concepts/scheduling-eviction/dynamic-resource-allocation/
date: 2023-11-28
short_description: >
A short-lived object that is created by the kube-scheduler to coordinate with resource drivers the
selection of a Node for the Pod, that uses one or more ResourceClaims.
related:
- kube-scheduler
- resource-claim
- resource-driver
---

A [Pod Scheduling Context](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/) is used
by kube-scheduler when Pod needs ResourceClaims in order to be scheduled.

<!--more-->

Resource drivers and kube-scheduler communicate through RecourceClaim and PodSchedulingContext objects
during scheduling. The Pod only gets a Node name assigned when all the ResourceClaims listed in
PodSchedulingContext are in status `Allocated` and are `ReservedFor` for the Pod that is being scheduled.
20 changes: 20 additions & 0 deletions content/en/docs/reference/glossary/ResourceClaim.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
title: ResourceClaim
id: ResourceClaim
date: 2023-10-16
full_link: /docs/concepts/scheduling-eviction/dynamic-resource-allocation/
short_description: >
Defines what kind of resource is needed and what the parameters for it are.
aka:
tags:
- core-object
- fundamental
---
Additional parameters are provided by a cluster admin in
{{< glossary_tooltip text="ResourceClass" term_id="ResourceClass" >}}.

<!--more-->

Can reference
{{< glossary_tooltip term_id="ResourceClaimParameters" text="ResourceClaimParameters">}}
with {{< glossary_tooltip term_id="resource-driver" text="Resource Driver">}}-specific details.
19 changes: 19 additions & 0 deletions content/en/docs/reference/glossary/ResourceClaimParameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
title: ResourceClaimParameters
id: ResourceClaimParameters
date: 2023-10-16
full_link: /docs/concepts/scheduling-eviction/dynamic-resource-allocation/
short_description: >
Specification of what and how much of resources the ResourceClaim needs.
aka:
tags:
- extension
---
{{< glossary_tooltip term_id="resource-driver" text="Resource Driver">}}-specific object, subject
to vendor implementation. Optional. Typically contains quantity and characteristics of the requested
resources.

<!--more-->

Not part of core Kubernetes. Referenced in `ParametersRef` field of
{{< glossary_tooltip term_id="ResourceClaim" text="ResourceClaim">}}.
29 changes: 29 additions & 0 deletions content/en/docs/reference/glossary/ResourceClass.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
title: ResourceClass
id: ResourceClass
date: 2023-10-16
full_link: /docs/concepts/scheduling-eviction/dynamic-resource-allocation/
short_description: >
Describes the type of resources the Resource Driver can allocate.
aka:
tags:
- core-object
- fundamental
---
Abstract object that links {{< glossary_tooltip term_id="ResourceClaim" text="ResourceClaims">}}
and {{< glossary_tooltip term_id="resource-driver" text="Resource Drivers">}}.

<!--more-->

When ResourceClaim needs resources allocation, its `resourceClassName` field indicates which
ResourceClass will be used to initiate allocation. ResourceClass contains the name of the driver,
that will perform the allocation, in `driverName` field, and optionally
{{< glossary_tooltip term_id="ResourceClassParameters" text="ResourceClassParameters">}}
reference to provide Resource Driver with further allocation process customization.

Same Resource Driver can be referenced in many ResourceClasses, typically in such case, ResourceClasses
have different {{< glossary_tooltip term_id="ResourceClassParameters" text="ResourceClassParameters">}}
telling driver to do the allocation differently for each of them. For instance, one class can be
used to allocate shared resources, another - to allocate resources exclusively.

Typically managed by the cluster admin.
14 changes: 14 additions & 0 deletions content/en/docs/reference/glossary/ResourceClassParameters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
title: ResourceClassParameters
id: ResourceClassParameters
date: 2023-10-16
full_link: /docs/concepts/scheduling-eviction/dynamic-resource-allocation/
short_description: >
Details for Resource Driver on how to allocate resources.
aka:
tags:
- extension
---
{{< glossary_tooltip term_id="resource-driver" text="Resource Driver">}}-specific object that,
when referenced in {{< glossary_tooltip term_id="ResourceClass" text="ResourceClass">}}, provides
details about how to allocate resources.
32 changes: 32 additions & 0 deletions content/en/docs/reference/glossary/resource-driver.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
title: Resource Driver
id: resource-driver
date: 2023-10-16
full_link: /docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers/
short_description: >
Software extensions to let Pods access devices that need vendor-specific initialization or setup
aka:
tags:
- extension
- fundamental
---
A resource driver is responsible for allocation of non-native resources requested by
{{< glossary_tooltip term_id="ResourceClaim" text="ResourceClaims">}}.

<!--more-->

Typically consists of one controller {{< glossary_tooltip term_id="pod" text="Pod ">}} and many
kubelet plugin Pods. Controller allocates hardware resources requested by ResourceClaim.
Kubelet plugin discovers supported hardware devices, advertises them to controller, prepares and
unprepares the allocated resources when {{< glossary_tooltip term_id="kubelet" text="kubelet" >}}
is preparing to start or has stopped the Pod.

There can be multiple {{< glossary_tooltip term_id="ResourceClass" text="ResourceClasses">}}
associated with one Resource Driver, typically in such case they have different
{{< glossary_tooltip term_id="ResourceClassParameters" text="ResourceClassParameters">}}
that customize resources allocation process.


See
[Dynamic Resource Allocation](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/)
for more information.
8 changes: 8 additions & 0 deletions content/en/docs/reference/helper-libraries/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: Helper Libraries
weight: 50
---

Libraries that can be used to jump-start development.
Typically implement common functionality for components that
otherwise would need to implement the same functionality each.
Loading

0 comments on commit b3c0599

Please sign in to comment.