Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a blog post about decoupled taint eviction controller #43676

Merged
merged 1 commit into from
Dec 13, 2023

Conversation

yuanchen8911
Copy link
Member

@yuanchen8911 yuanchen8911 commented Oct 24, 2023

Add a blog for KEP 3902

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 24, 2023
@k8s-ci-robot k8s-ci-robot added the area/blog Issues or PRs related to the Kubernetes Blog subproject label Oct 24, 2023
@k8s-ci-robot k8s-ci-robot requested a review from onlydole October 24, 2023 21:26
@k8s-ci-robot k8s-ci-robot added language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels Oct 24, 2023
@netlify
Copy link

netlify bot commented Oct 24, 2023

Pull request preview available for checking

Built without sensitive environment variables

Name Link
🔨 Latest commit fe882f1
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/65678d567d6b910008613adf
😎 Deploy Preview https://deploy-preview-43676--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@yuanchen8911 yuanchen8911 changed the title Add a log post to decoupled taint eviction manager (placeholder) Add a blog post to decoupled taint eviction manager (placeholder) Oct 24, 2023
@sftim
Copy link
Contributor

sftim commented Oct 24, 2023

/retitle [WIP] Add a blog post about decoupled taint eviction manager

@k8s-ci-robot k8s-ci-robot changed the title Add a blog post to decoupled taint eviction manager (placeholder) [WIP] Add a blog post about decoupled taint eviction manager Oct 24, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 24, 2023
@krol3
Copy link
Contributor

krol3 commented Nov 15, 2023

Hi @yuanchen8911 here Communication Team 1.29, the deadline to the feature blog be ready to review is this Friday, Nov 17th, the proposal publish date will be Dec 19th. cc: @a-mccarthy @kcmartin @James-Quigley

@kubernetes/sig-docs-blog-owners: Blog scheduled: Dec 19th, Publication Order Nro:8

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 15, 2023
@yuanchen8911 yuanchen8911 changed the title [WIP] Add a blog post about decoupled taint eviction manager [WIP] Add a blog post about decoupled taint eviction controller (WIP) Nov 15, 2023
@yuanchen8911 yuanchen8911 changed the title [WIP] Add a blog post about decoupled taint eviction controller (WIP) [WIP] Add a blog post about decoupled taint eviction controller Nov 15, 2023
@yuanchen8911
Copy link
Member Author

yuanchen8911 commented Nov 15, 2023

/cc @atosatto


It may slightly increase the communication overhead from applying node taints to performing pod eviction.

**Will enabling/using this feature result in a non-negligible increase in resource usage (CPU, RAM, disk, IO, ...) in any components?**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no-neglisible ... can we avoid this double negating term?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the word non-negligible.

Comment on lines 27 to 30
A new feature gate, `SeparateTaintManager`, has been added. To enable the new feature, users can use the following flags:

- kube-apiserver: `--feature-gates=SeparateTaintManager=true`
- kube-controller-manager: `--controllers=kube-eviction-controller`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually going to be enabled by default as the SeparateTaintManager feature-flag is in beta.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified it.

- kube-apiserver: `--feature-gates=SeparateTaintManager=true`
- kube-controller-manager: `--controllers=kube-eviction-controller`

For compatibility, the legacy `TaintManager` can still be used with the `--taint-manager` flag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. If users want to use the old taint-manager they need to set the feature-flag to false (e.g. SeparateTaintManager=false).

What this feature gives the users, is the ability to disable taint-based eviction by setting SeparateTaintManager=true and --controllers=-kube-eviction-controller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it.

- Adding a pre-defined set of `NoExecute` taints to nodes based on the node conditions.
- Performing pod eviction on NoExecute taints

With the latest release, the taint-based eviction implementation has been moved out of the `node-lifecycle-controller` into a separate and independent component called `TaintEvictionController`. This separation aims to disentangle code, enhance code maintainability, and facilitate future extensions to either component.
Copy link
Contributor

@atosatto atosatto Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add something high-lighting the benefits and the potential use-cases where this could be useful, as done in the KEP. I'd also mention the new metrics which have been introduced with this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the following

Use Cases

This new feature will allow cluster administrators to extend and enhance the default TaintEvictionController and even replace the default TaintEvictionController with a custom implementation to meet different needs, e.g., better supportof stateful workloads that use PersistentVolume on local disks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the metrics description as follows.

As part of the change, addtitonal metrics are introduced to monitor taint-based pod evictionis.

  • pod_deletion_duration_seconds measures th latency between the time when a taint effect has been activated for the Pod and its deletion via TaintEvictionController.
  • pod_deletions_total reports the total number of Pods deleted by TaintEvictionController since its start

Comment on lines 40 to 42
**Will enabling/using this feature result in an increase in the time taken by any operations covered by existing SLIs/SLOs?**

It may slightly increase the communication overhead from applying node taints to performing pod eviction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct, as setting taints and evicting Pods is already done via separated reconcilers in the old implementation. We strived to the changes to not impact Customers experience in any meaningful way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to No.


As with any Kubernetes feature, multiple community members have contributed, from writing the KEP to implementing the new controller and reviewing the KEP and code. Special thanks to:

- Aldo Culquicondor (@alculquicondor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like us to add here @atiratree and @logicalhan. Their feedback on the implementation, alongside @soltysh and @Huang-Wei is what made this change possible! Thank you 🙇‍♂️!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the names.

@Ritikaa96
Copy link
Contributor

/lgtm
Thanks @yuanchen8911

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 28, 2023
## How to use the new feature?

A new feature gate, `SeparateTaintEvictionController`, has been added. The feature is enabled by default as Beta in Kubernetes 1.29.
Please refer to the [feature gate document](/content/en/docs/reference/command-line-tools-reference/feature-gates.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should reference the relative path on the website and not on github as suggested here #43676 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

/lgtm cancel

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced it with the short URL.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 28, 2023
@sftim
Copy link
Contributor

sftim commented Nov 28, 2023

Thanks

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 28, 2023
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 29, 2023
Add authors

Update

Fix errors and address Andrea's comments

Removed duplication

Update 2023-10-24-taint-eviction-controller.md

Co-authored-by: Ritika <[email protected]>

Update 2023-10-24-taint-eviction-controller.md

Co-authored-by: Ritika <[email protected]>

Update 2023-10-24-taint-eviction-controller.md

Co-authored-by: Ritika <[email protected]>

Update 2023-10-24-taint-eviction-controller.md

Co-authored-by: Ritika <[email protected]>

Update 2023-10-24-taint-eviction-controller.md

Co-authored-by: Ritika <[email protected]>

Address comment and rename the file

Fix format

Address comments

Add the feature gate reference

Fix typos and format issues

Fix reference link error

Fix a borken link

Update content/en/blog/_posts/2023-12-19-taint-eviction-controller.md

Co-authored-by: Tim Bannister <[email protected]>

Update content/en/blog/_posts/2023-12-19-taint-eviction-controller.md

Co-authored-by: Tim Bannister <[email protected]>

Update content/en/blog/_posts/2023-12-19-taint-eviction-controller.md

Co-authored-by: Tim Bannister <[email protected]>

Update content/en/blog/_posts/2023-12-19-taint-eviction-controller.md

Co-authored-by: Tim Bannister <[email protected]>

Update content/en/blog/_posts/2023-12-19-taint-eviction-controller.md

Co-authored-by: Tim Bannister <[email protected]>

Update content/en/blog/_posts/2023-12-19-taint-eviction-controller.md

Co-authored-by: Tim Bannister <[email protected]>

node to Node

Update content/en/blog/_posts/2023-12-19-taint-eviction-controller.md

Co-authored-by: Tim Bannister <[email protected]>

Update content/en/blog/_posts/2023-12-19-taint-eviction-controller.md

Co-authored-by: Ritika <[email protected]>

Update content/en/blog/_posts/2023-12-19-taint-eviction-controller.md

Co-authored-by: Ritika <[email protected]>

Update content/en/blog/_posts/2023-12-19-taint-eviction-controller.md

Co-authored-by: Ritika <[email protected]>

Update content/en/blog/_posts/2023-12-19-taint-eviction-controller.md

Co-authored-by: Tim Bannister <[email protected]>

Fix the reference link

Update content/en/blog/_posts/2023-12-19-taint-eviction-controller.md

Co-authored-by: Ritika <[email protected]>
@sftim
Copy link
Contributor

sftim commented Nov 30, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 30, 2023
@sftim
Copy link
Contributor

sftim commented Dec 1, 2023

/lgtm cancel
/approve cancel

Requires new date (release of v1.29 has been postponed)

@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 1, 2023
@sftim
Copy link
Contributor

sftim commented Dec 3, 2023

The new publication date is 2023-12-19; please adjust the article.

@sftim
Copy link
Contributor

sftim commented Dec 4, 2023

Thanks
/lgtm
/approve

Do not merge until Kubernetes v1.29 is released.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 4, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 2b872fda0ba6c964d2b70dc06dc0f42758e71c56

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sftim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 4, 2023
@reylejano
Copy link
Member

1.29 is released
/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 13, 2023
@k8s-ci-robot k8s-ci-robot merged commit 751ec13 into kubernetes:main Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/blog Issues or PRs related to the Kubernetes Blog subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/docs Categorizes an issue or PR as relevant to SIG Docs. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Status: Published
Development

Successfully merging this pull request may close these issues.

10 participants