-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Resource Drivers section to extending Kubernetes documentation #43742
Conversation
✅ Pull request preview available for checkingBuilt without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments.
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
/sig node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Please match the style guide's advice on how to write API kinds and related names.
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
The proposed concept page has a lot of reference details, such as gRPC interfaces. These don't really belong in a page that has the main purpose of explaining what resource drivers are.
I'd omit the gRPC blocks and the surrounding text. Consider whether https://kubernetes.io/docs/reference/node/ is a better home for those details.
Alternatively, you can also write a one-off blog article with some details. Blog articles don't need to be complete and maintaining them after publication is optional. We can then revisit these docs for beta.
content/en/docs/concepts/extend-kubernetes/compute-storage-net/_index.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
Apologies for the confusion, I've renamed the PR right as you were reviewing it. The main purpose doc in this PR is not just introduction to resource drivers, but extending the Kubernetes by building a resource driver, this is very much analogous to the existing device plugins page here: https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/. |
The existing page about device plugins is OK but there's lots of room to improve it. Essentially, watch out for using early / “good enough” docs as a guide to how new docs should come in. For a brand new page, we'd like to see more structure. Concept pages should explain what things are; other details belong in task, tutorial or reference pages. And we use hyperlinks to connect the different pieces. This gatekeeping for new content helps drive the overall documentation quality in the direction we'd like to see. |
/sig architecture for the question about whether resource driver names ending with actual DNS names are an expectation or a hint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added few possible text improvement suggestions. Feel free to just ignore/resolve ones you disagree with.
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
2060275
to
3957dfb
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
3957dfb
to
794c7b9
Compare
/hold cancel Squash has happened. |
@windsonsea , @divya-mohan0209 , I fixed all review comments and technical review from sig-node was done by @pohly , could you please have a look and LGTM this if you have no further comments, this has been stagnating for a while now. |
--- | ||
|
||
<!-- overview --> | ||
{{< feature-state for_k8s_version="v1.26" state="alpha" >}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the specified feature state (Kubernetes v1.26 [alpha]) accurate as mentioned here? I noticed on another page (Dynamic Resource Allocation page), which is also referenced in the content below, that the state is indicated as (Kubernetes v1.27 [alpha]).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature was in fact added in 1.26:
$ git log v1.26.0 | grep -B4 "kube features: add DynamicResourceAllocation"
commit 155d49813f131a2beb6aeef71270f94b6f5fcf18
Author: Patrick Ohly <[email protected]>
Date: Tue Mar 22 16:53:58 2022 +0100
kube features: add DynamicResourceAllocation
But, 1.27 had backwards-incompatible API change, I guess that was the reason to name it as 1.27 in the other page. I'll make it 1.27 here too for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Force-pushed squashed commmits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{{< feature-state for_k8s_version="v1.26" state="alpha" >}}
sounds correct. I'm inferring that the first alpha release was v1.26, even if there have been changes since.
https://kubernetes.io//docs/concepts/scheduling-eviction/dynamic-resource-allocation/ is IMO incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file name _intex.md
appears to be incorrect. It would be more appropriate as _index.md
to ensure the proper organization of content under the "helper-libraries" section.
e08daf0
to
77421cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. It's good to see more detail.
I'm not keen on the suggested glossary entries.
I would drop all of these and add one for “Dynamic Resource Allocation”. I might keep “Resource Driver” but the others are not really right for our docs.
We have an ambition to add fact sheets for API kinds; if / when implemented, those would cover the API kinds for DRA.
@@ -0,0 +1,19 @@ | |||
--- | |||
title: Resource Claim Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following our style guide, this would be “ResourceClaimParameters” (no spaces). Same for other API kinds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about spelling this in text somewhere, but looking at the rest of glossary items, they seem to be spelled with space in title, and with hyphen in id, I just tried to keep it consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- No spaces in titles
- ID can be either kebab-case or (preferred) matching the API kind
See #43742 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated glossary items.
@@ -0,0 +1,19 @@ | |||
--- | |||
title: Resource Claim Parameters | |||
id: resource-claim-parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use ResourceClaimParameters
as the id
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be inconsistent with the rest of glossary items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm saying, as an approver and tech lead for SIG Docs, that what I'm recommending is our ideal practice. We welcome help to bring other glossary items in line with our style guide, but we are not prioritizing that work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change might be causing the build failures. After looking at the {{< glossary_definition >}} (applicable to glossary_tooltip as well) shortcode file, it appears that the shortcode relies on matching the term_id
passed with glossary entry filenames to retrieve the description.
With this recent change, the id
in the file now reads as "ResourceClaimParameters" while the file name is "resource-claim-parameters.md". Consequently, the shortcode is unable to locate the file to display the necessary data for term_id="ResourceClaimParameters"
, leading to the build failure. To address this, I suggest temporarily reverting to the existing conventions. However, in the long run, we could potentially enhance the shortcode logic to match the term_id
with the id
within the glossary file content instead of relying solely on filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of reverting the id field, I can just rename the files to match the .md, right ? I'll give it a shot.
## Implementing controller | ||
|
||
Communication between the scheduler and a resource driver is done through | ||
{{< glossary_tooltip term_id="pod-scheduling-context" text="PodSchedulingContext">}} API object. It is | ||
recommended to use [controller helper library](/docs/reference/helper-libraries/dra-driver-controller/), | ||
it implements all needed operations related to PodSchedulingContext, they are common for all resource | ||
drivers. | ||
|
||
When using the DRA helper code, resource driver controller has to implement a | ||
[driver interface](https://pkg.go.dev/k8s.io/[email protected]/controller#Driver), | ||
which then can be simply used with | ||
[controller helper instance](https://pkg.go.dev/k8s.io/[email protected]/controller#New). | ||
|
||
See the [example resource driver controller](https://github.com/kubernetes-sigs/dra-example-driver/blob/151b7c8da2e620c47a3591e1a937f8d0297b0c25/cmd/dra-example-controller/main.go#L208) for details. | ||
|
||
A resource driver controller's main responsibility is to allocate and deallocate resources for | ||
{{< glossary_tooltip term_id="resource-claim" text="ResourceClaims">}}. | ||
|
||
There are two modes of allocation the ResourceClaim can have: | ||
|
||
- `WaitForFirstConsumer` (default), which you could think of as meaning _delayed_. | ||
In this mode the cluster only requests resource(s) | ||
for ResourceClaim when a Pod that needs it is being scheduled. | ||
- `Immediate`: the resource has to be allocated to the ResourceClaim as soon as possible, and | ||
retained until the ResourceClaim is deleted. | ||
|
||
If `allocationMode` is not set explicitly, the mode is `WaitForFirstConsumer`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move this level of detail into a separate page within https://kubernetes.io/docs/reference/node/, and write a shorter text about why PodSchedulingContext and ResourceClaim are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree that reference spec of the ResourceClaim object is better placed in reference/node
, this is supposed to be an article about extending Kubernetes, how to write a resource driver. Implementing a controller part of a resource driver seems to me very much suitable here. We don't bring in the whole info about the ResourceClaim or PodSchedulingContext in here, this is in fact a short summary, only what's important to keep in mind when writing a resource driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we're hoping to see in this page is an explanation of the concept of a resource driver. We would place a tutorial about implementing one somewhere else.
There's a lot of detail here which feels like it could be too much someone who first of all wants to learn what DRA is. We need to serve that audience before we serve would-be implementers.
What might work instead: an evergreen
blog article with a guide to how to implement a DRA controller.
We can help reshape this PR to match the approach we usually take, as a project, to documenting Kubernetes behavior. I do think this needs those changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so the "extending Kubernetes" is not for writing extensions, but about customizing the instance of Kubernetes, like installation of device plugins and CRDs, and in our case resource drivers. got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minimized the chapter.
I'm happy to drop the glossary items when #38682 is implemented. |
Co-authored-by: Michael <[email protected]> Co-authored-by: Evan Lezar <[email protected]> Co-authored-by: Tim Bannister <[email protected]> Co-authored-by: Eero Tamminen <[email protected]> Co-authored-by: Patrick Ohly <[email protected]> Co-authored-by: Dipesh Rawat <[email protected]>
f2d6d04
to
b3c0599
Compare
Renaming glossary files worked, I squashed commits and force-pushed. |
@@ -0,0 +1,8 @@ | |||
--- | |||
title: Helper Libraries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find a suitable place that exists, so I created one. It's supposed to contain the prototype reference that the resource driver needs to implement, if I'm not mistaken, it was your idea to move this to the reference section from the main document. Where do you think the helper libraries for writing a resource driver are more suitable?
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
In both modes the allocation is preceded by getting parameters objects for ResourceClaims and | ||
ResourceClasses to ensure the resource driver is able to get these objects and understand them. | ||
|
||
## Sharing resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move this section much earlier in the page.
content/en/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend updating https://k8s.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/ to link to this new page.
date: 2023-10-16 | ||
full_link: /docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers/ | ||
short_description: > | ||
Software extensions to let Pods access devices that need vendor-specific initialization or setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't sound quite right. You might use a resource driver just for the allocation aspect, without any vendor specific setup.
For example, using DRA to assign a tape robot to a Pod that is running backup software, and exposing a vendor neutral protocol for tape management (eg: iSCSI).
@byako, would you be willing to write a blog article about implementing a resource driver? That would help readers, I think. |
Co-authored-by: Tim Bannister <[email protected]>
…/resource-drivers.md Co-authored-by: Tim Bannister <[email protected]>
…/resource-drivers.md Co-authored-by: Tim Bannister <[email protected]>
Co-authored-by: Tim Bannister <[email protected]>
Co-authored-by: Tim Bannister <[email protected]>
Co-authored-by: Tim Bannister <[email protected]>
@sftim: I'm curious whether your suggestion was supplementary to this PR or whether you're recommending this be moved to a blog? |
We should definitely add something to the Extending Kubernetes pages. However, I think a blog article is the right home for many of the points that @byako wants to make. Here's why: drivers change. vendors change. the way people manage nodes and their extensions will change. The more that can go into an of-its-time blog article, the better. Not only that, @byako, it lets you tell a story - and I get the sense that you'd like to take readers on a journey with what you're writing. |
I've discussed the sensibility of proceeding with this PR with our tech leads responsible for this feature, and the consensus is that we put this on hold because 1.30 and 1.31 bring so much difference to this feature. We'll be back with new PR based on this one later in 2024, close to 1.31 release. |
A note about why I was keen on this: it's handy if, when people look at the v1.29 documentation (because that's what they're running), they see details that are right for that version. Updates for v1.30 and v1.31 won't help those readers, in fact they'll make the docs even less relevant for that particular need. However, if we don't have capacity for this update, so be it. In the long term, we have an answer: v1.29 will eventually not be supported. |
I understand, and believe me I wanted to see this merged as well given how much time both - the community and myself put into this. However, you're right - there is less capacity at the moment. There might still some time for a blog article in the summer, with updated feature functionality. |
Dynamic Resource Allocation (DRA) has been available since 1.26, this adds Resource Driver concept and glossary items to help improving DRA awareness and hopefully increase the number of resource drivers out there, which would bring more feedback about the framework and help developing it further.
Fixes #38557