Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add node lifecycle documentation #45074

Open
thockin opened this issue Jan 17, 2023 · 32 comments
Open

Add node lifecycle documentation #45074

thockin opened this issue Jan 17, 2023 · 32 comments
Assignees
Labels
kind/documentation Categorizes issue or PR as related to documentation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/docs Categorizes an issue or PR as relevant to SIG Docs. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@thockin
Copy link
Member

thockin commented Jan 17, 2023

As far as I can tell, we don't have a comprehensive doc which covers the expected lifecycle of nodes in Kubernetes.

Specifically, we have lots of intersecting, async things which involve nodes. For example:

  • Many environments have VMs "behind" Nodes. Those VMs can be deleted without telling k8s. Then someone comes along and deletes the node "in response", but this is racy and causes confusion.
  • Many environments have subsystems which cross-reference things which need to coordinate with node lifecycle. E.g. the service controller puts VMs into LBs, but does so by enumerating Nodes (ignorant of the VM lifecycle).
  • Some components manage nodes directl (e.g. Cluster Autoscaler, Karpenter).

For an example of things that I think are "weird" for lack of docs, look at kubernetes/autoscaler#5201 (comment) . ClusterAutoscaler defines a taint which it uses to prevent work from landing on "draining" nodes (even though we have the unschedulable field already). The service LB controller currently uses that taint to manage LBs. Cluster autoscaler removes the VM from the cloud, and leaves the Node object around for someone else to clean up.

The discussion is about the MEANING of the taint, when it happens, and how to be more graceful. What we want is a clear signal that "this node is going away" and a way for 3rd parties to indicate they have work to do when that happens. It strikes me that we HAVE such a mechanism - delete and finalizers. But CA doesn't do that. I don't know why, but I suspect there are reasons. Should it evolve?

I'd like to see a sig-node (or sig-arch?) owned statement of the node lifecycle. E.g. if the "right" way to signal "this node is going away" is to delete the node, this would say that. Then we can at least say that we think CA should adopt that pattern. If we think it needs to be more sophistacted (aka complicated) then we should express that.

@thockin thockin added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. labels Jan 17, 2023
@k8s-ci-robot
Copy link
Contributor

@thockin: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 17, 2023
@bobbypage
Copy link
Member

/cc

@sftim
Copy link
Contributor

sftim commented Jan 18, 2023

Once that's stated, consider transferring the issue to k/website to track putting that detail in the public docs.

/kind documentation

@k8s-ci-robot k8s-ci-robot added the kind/documentation Categorizes issue or PR as related to documentation. label Jan 18, 2023
@sftim
Copy link
Contributor

sftim commented Jan 18, 2023

A bit related: am I right that the Cluster Autoscaler is using an unregistered taint - not listed in https://kubernetes.io/docs/reference/labels-annotations-taints/ - for marking node drains?

@thockin
Copy link
Member Author

thockin commented Jan 19, 2023

@sftim seems so - is that list generated from an in-git source other than the website?

@x13n FYI

@sftim
Copy link
Contributor

sftim commented Jan 19, 2023

Right now there's no autogeneration for https://kubernetes.io/docs/reference/labels-annotations-taints/

Autogenerated pages should have a footer to mark this, eg see https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/

(related to your question, @thockin , I've considered making each entry be its own source file, which might be relevant to any future thoughts on more automation)

@x13n
Copy link
Member

x13n commented Jan 19, 2023

Can we document CA taints on that website given they don't follow .*.kubernetes.io/.* pattern?

@sftim
Copy link
Contributor

sftim commented Jan 19, 2023

Can we document CA taints on that website given they don't follow .*.kubernetes.io/.* pattern?

The usual approach is two phase:

  • switch whatever component to use a valid label / annotation / taint
    • optionally, retain support for the old label / annotation / taint but mark it deprecated
  • register and document the new thing

@x13n
Copy link
Member

x13n commented Jan 19, 2023

That makes sense, created kubernetes/autoscaler#5433 to track this.

@thockin
Copy link
Member Author

thockin commented Jan 19, 2023 via email

@aojea
Copy link
Member

aojea commented Feb 5, 2023

/cc

@tzneal
Copy link
Contributor

tzneal commented Feb 9, 2023

I'd like to see a sig-node (or sig-arch?) owned statement of the node lifecycle. E.g. if the "right" way to signal "this node is going away" is to delete the node, this would say that. Then we can at least say that we think CA should adopt that pattern.

FWIW, Karpenter currently puts a finalizer on nodes and uses that to allow users to control the node by just deleting it. We've had some discussions on if this is a good pattern and would welcome an official pattern recommendation.

@sftim
Copy link
Contributor

sftim commented Feb 9, 2023

/sig docs

too

@k8s-ci-robot k8s-ci-robot added the sig/docs Categorizes an issue or PR as relevant to SIG Docs. label Feb 9, 2023
@ellistarn
Copy link

We've been reverse engineering node lifecycle assumptions for the past 2 years in https://github.com/aws/karpenter.

Some hiccups we've had:

  • We add a finalizer to the node. This hooks into our cordon/drain logic, and ensures that the underlying machine is deleted before the node is gone. Customers love this feature, as it enables things like kubectl delete node -l topology.kubernetes.io/zone=us-west-2a. It's worked well so far, but we worry about training users to do this in non-karpenter environments.
  • We used to create the node object, since it allowed us to be more stateless. This caused a bunch of problems with gpu registration and assumptions of other systems. See: https://github.com/aws/karpenter/blob/main/designs/node-ownership.md
  • We had to implement a feature called "startup taints", which communicates to our scheduler that we expect the taint to be removed, so we should stop launching nodes and instead wait for the taints to come off. Support Startup Taints aws/karpenter-provider-aws#628 (comment)

I'd love to come together with the group and sort out some of the painpoints, and find a path forward that simplifies and documents node lifecycle assumptions. I'm not confident that a shutdown taint is the right direction.

@sftim
Copy link
Contributor

sftim commented Feb 14, 2023

What should the kubelet do if it observes its Node being deleted?

(eg: stop all local Pods, remove the kubelet's finalizer, quit)

@kerthcet
Copy link
Member

We had to implement a feature called "startup taints", which communicates to our scheduler that we expect the taint to be removed, so we should stop launching nodes and instead wait for the taints to come off. aws/karpenter-provider-aws#628 (comment)

Hi @ellistarn , I'm considering adding something like daemonJob for setup/cleanup works of nodes, do you think it will help with your situation here? kubernetes/kubernetes#115716

@ellistarn
Copy link

Not that I'm aware of @kerthcet. The problem is more that add-ons like cilium use a custom taint to communicate node readiness to the kube scheduler, on top of existing node readiness mechanisms.

@kerthcet
Copy link
Member

One thing to highlight is that: we can add some resources like labels/taints before running the job, after success, we can remove them if you want, I suppose cilium use a custom taint only in initializing, is that right? @ellistarn

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 21, 2023
@thockin thockin removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 30, 2023
@sftim
Copy link
Contributor

sftim commented Jul 7, 2023

/retitle Add node lifecycle documentation

I think the primary audience is contributors and cluster operators are a key secondary audience. Have I got that right?

@k8s-ci-robot k8s-ci-robot changed the title Node lifecycle documentation ? Add node lifecycle documentation Jul 7, 2023
@tzneal
Copy link
Contributor

tzneal commented Jul 7, 2023

I would say direct contributors, and ecosystem contributors. E.g. if you develop some monitoring tool, you probably deeply care about this topic as well.

@dims
Copy link
Member

dims commented Sep 7, 2023

long-term-issue (note to self)

@pacoxu
Copy link
Member

pacoxu commented Nov 10, 2023

Decouple TaintManager from NodeLifecycleController enhancements#3902 may be in scope of this issue.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2024
@sftim
Copy link
Contributor

sftim commented Feb 9, 2024

/transfer website

@k8s-ci-robot k8s-ci-robot transferred this issue from kubernetes/kubernetes Feb 9, 2024
@sftim
Copy link
Contributor

sftim commented Feb 9, 2024

/remove-lifecycle stale
/lifecycle frozen
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 9, 2024
@sftim
Copy link
Contributor

sftim commented Feb 9, 2024

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Feb 9, 2024
@dchen1107
Copy link
Member

cc/ @wangzhen127 Can you work with @sftim and others on this?

@wangzhen127
Copy link
Member

Sure. I am still ramping up in this space. Will follow up.

@sftim
Copy link
Contributor

sftim commented Feb 29, 2024

work with @sftim and others on this?

The SIG to work with is SIG Docs (#sig-docs in Kubernetes' Slack workspace).

@sftim
Copy link
Contributor

sftim commented Mar 25, 2024

#45197 has helped with this

@sftim
Copy link
Contributor

sftim commented Mar 25, 2024

@wangzhen127 what help would you like here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Categorizes issue or PR as related to documentation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/docs Categorizes an issue or PR as relevant to SIG Docs. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests