Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe appropriate use of node role labels and fixes that should be made #1144

Merged
merged 1 commit into from
Aug 28, 2019

Conversation

smarterclayton
Copy link
Contributor

node-roles were not intended for internal use and this KEP clarifies both
their use and describes the process for resolving their internal use.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. labels Jul 17, 2019
@smarterclayton
Copy link
Contributor Author

@liggitt

@smarterclayton smarterclayton force-pushed the node_role branch 3 times, most recently from 110d83d to 4252b01 Compare July 17, 2019 03:03

* Kubernetes components MUST NOT set or alter behavior on any label within the `node-role.kubernetes.io/*` namespace.
* Kubernetes components (such as `kubectl`) MAY simplify the display of `node-role.kubernetes.io/*` labels to convey the node roles of a node
* Kubernetes examples and documentation MUST NOT leverage the node-role labels for node placement
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the big one that has historically been the problem: how can a workload be targeted only to the control plane nodes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess apart from a node-role label, a CP node would also need a regular label, that a nodeSelector can then target?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not specified and is implementation specific.

There is no such thing as a "control plane node" defined in Kube, and I think targeting user workloads to those nodes is completely outside the scope of Kube. Anyone who wants to do that needs to accept a label selector at a generic level.

I'll add that to the KEP, but I don't see this KEP as trying to solve that problem or any appetite to solve it, because control plane is vague and undefined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that we don't have a concept of "control plane nodes". Once you register a node of any sort, you have the "standard" mechanisms for workload inclusion/exclusion.

My question is where the line falls. Is it OK for a tool (think kubeadm) to have a "mode" which dedicates some nodes as control-plane? Should it accept an arbitrary label name to use instead of assuming this magic label? Is the label actually a sufficient primitive?

keps/sig-architecture/2019-07-16-node-role-label-use.md Outdated Show resolved Hide resolved
keps/sig-architecture/2019-07-16-node-role-label-use.md Outdated Show resolved Hide resolved

#### Node controller excludes master nodes from consideration for eviction

The `k8s.io/kubernetes/pkg/util/system/IsMasterNode(nodeName)` function is used by the NodeLifecycleController to exclude nodes with a node name that ends in `master` or starts with `master-` when considering whether to mark nodes as disrupted. A recent PR attempted to change this to use node-roles and was blocked. Instead, the controller should be updated to use a label `node.kubernetes.io/exclude-disruption` to decide whether to exclude nodes from being considered for disruption handling.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Can we enable this one right away? Do we need to feature-flag it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything new needs to go in as alpha gated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describing the intent rather than implying an intent from a role is vastly preferred.

fyi @dchen1107 as this is node controller.

keps/sig-architecture/2019-07-16-node-role-label-use.md Outdated Show resolved Hide resolved

#### Kubernetes e2e tests

The e2e tests use a number of heuristics including the `IsMasterNode(nodeName)` function and the node-roles labels to select nodes. In order for conformant Kubernetes clusters to run the tests, the e2e suite must change to use individual user-provided label selectors to identify nodes to test, nodes that have special rules for testing unusual cases, and for other selection behaviors. The label selectors may be defaulted by the test code to their current values, as long as a conformant cluster operator can execute the e2e suite against an arbitrary cluster.
Copy link
Member

@justinsb justinsb Jul 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clearly differentiate between e2e tests that are part of the conformance suite, and the idea that a conformant cluster should be able to pass the whole e2e suite. It is absolutely not the case that a conformant cluster should be able to pass the whole suite (sadly), otherwise why mark tests for conformance. We have coverage of some pretty specific scenarios in our e2e suite, and it's going to be tough to replace this.

Unless what you're saying is we replace

const MasterNodeLabel = "node-role.kubernetes.io/master"

with

var MasterNodeLabel = flag.String("master-node-label", "node-role.kubernetes.io/master", "the label which identifies master nodes")

That seems reasonable, as does the idea that we should avoid tests requiring that the flag be a certain value, but I do think it's reasonable to have a conformance test that e.g. "pods keep running when the control plane never responds to requests", where labeling these nodes is likely the most practical way to implement this. Increasing e2e coverage feels more important than the ideological purity of the non-conformance tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The e2e tests need to not encode deployment assumptions. The assumptions that use node label today are almost all "wrong" in the sense that they assume deployment topology. e2e tests MUST NOT assume things like "pods in this role thus have open ports like this". The key distinction here is if an e2e test needs to read a particular port from a particular service, if that port is not part of core Kubernetes the e2e test needs to take it as a parameter. It might have a default for that parameter, but it's not acceptable for an e2e test to only work on one particular type of GCE test (i.e. kind should be able to run e2e tests if the apiserver isn't listening on 6443).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd draw the distinction between the e2e tests that are conformance vs non-conformance. Should we have configuration for these things and perfect tests that are ideologically pure, absolutely. But for a non-conformance test, I'd say coverage is more important - sometimes e2e tests need to do bad things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I understand that, but I still would rather just fix the e2e tests in general since conformance is an arbitrary subset anyway. This isn't really a huge problem.

@justinsb
Copy link
Member

justinsb commented Jul 17, 2019

We should also address the primary use case for these labels: how can a manifest target a pod to only run on a master node? That's the reason why they are so ubiquitous; their use inside k/k feels almost incidental.

@smarterclayton
Copy link
Contributor Author

how can a manifest target a pod to only run on a master node? That's the reason why they are so ubiquitous; their use inside k/k feels almost incidental.

This cannot be defined by Kubernetes because Kubernetes does not require a particular deployment topology. But certainly a large set of deployers can use node-role.kubernetes.io/master to indicate that this is a "master" - this is a convention in practice today. However, components within Kube may not key behavior or test coverage off that.

@justinsb
Copy link
Member

I think it's fine to promote the service-exclusion label to GA, and to create another label that controls eviction, etc. Those are wonderful changes that users will appreciate.

But... users today need to steer workloads to run on the master. I would love it if we had a better answer on identity & isolation, but let's wait to tackle "node-role" until we are ready to make that case.

Propose that you rename this KEP to be "promote service-exclusion labels to GA and introduce node.kubernetes.io/exclude-disruption", so that it is technical and not political/philosophical.

@smarterclayton
Copy link
Contributor Author

But... users today need to steer workloads to run on the master

By definition we cannot give all users that guarantee. Can you describe in more detail what your use was is? Anyone who tries to schedule onto any given topology has no guarantee that's even possible within a cluster (like GKE), so is this "I want to schedule on masters if they are available but something else if they aren't?"

@justinsb
Copy link
Member

The use case seems to be a cluster component that wants either isolation or security. Security can be in case of breakout, or to reuse IAM roles or similar. Before all the installation tooling agreed on these labels, the producers of these manifests had to produce different manifests depending on the installation tooling in use. This is why we standardized it across the OSS kubernetes tooling.

Yes, it doesn't work on GKE / EKS or other kubernetes that host the control plane, but usually these cluster addons require special permissions anyway and thus aren't user installable there anyway.

Let's either expand this KEP so that it addresses the big picture - I'm happy to help you do so - or let's focus it on what it is doing technically, lest it suck the joy out of actually fixing node-roles.

@smarterclayton
Copy link
Contributor Author

Before all the installation tooling agreed on these labels, the producers of these manifests had to produce different manifests depending on the installation tooling in use. This is why we standardized it across the OSS kubernetes tooling.

We did not do this or require this. A number of installers chose to do that. It is not required, nor standard.

I'm really confused what you mean by "fix node-roles" now. Maybe we should have a chat next week and dig in.

@neolit123
Copy link
Member

neolit123 commented Jul 19, 2019 via email

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with removing the concept of a node role in favor of labels that have more specific intents.

I would like to understand how we can enforce this concept across cloud providers as well given the large amount of innovation in hosting models across the ecosystem which depend on those components that were in-tree and now are out of tree.


#### Node controller excludes master nodes from consideration for eviction

The `k8s.io/kubernetes/pkg/util/system/IsMasterNode(nodeName)` function is used by the NodeLifecycleController to exclude nodes with a node name that ends in `master` or starts with `master-` when considering whether to mark nodes as disrupted. A recent PR attempted to change this to use node-roles and was blocked. Instead, the controller should be updated to use a label `node.kubernetes.io/exclude-disruption` to decide whether to exclude nodes from being considered for disruption handling.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describing the intent rather than implying an intent from a role is vastly preferred.

fyi @dchen1107 as this is node controller.

QUESTION: Is a single label selector sufficient to identify nodes to test?


#### Preventing accidental reintroduction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we clarify how this should work with cloud provider code? for example, i assume cloud provider (in-tree or out) by convention must not make assumptions on deployment topology? how can we check for that as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a brief paragraph, but cloud providers are still part of the core K8s rule set because they are owned by a sig and required to maintain conformance on those clouds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this distinct from kubeadm exactly?

@derekwaynecarr
Copy link
Member

@dchen1107 lets ensure there is no disagreement in next SIG node.

/assign
/assign @dchen1107
/assign @thockin

@thockin
Copy link
Member

thockin commented Aug 8, 2019

@freehan pointed out that Ingress controllers also use this and we should think of them. He will follow up with thoughts.

@liggitt
Copy link
Member

liggitt commented Aug 8, 2019

@freehan pointed out that Ingress controllers also use this and we should think of them. He will follow up with thoughts

By "this", do you mean role labels or something else?

If you meant role labels, Ingress controllers that rely on role labels today are already dependent on particular deployments, since not all set those labels. Note that this doesn't propose removing anything that is currently applying the role labels, so it wouldn't make ingress controllers more broken than they are today.

@thockin
Copy link
Member

thockin commented Aug 9, 2019 via email

@freehan
Copy link
Contributor

freehan commented Aug 14, 2019

@sftim
Copy link
Contributor

sftim commented Aug 17, 2019

Is there a role for SIG Docs in this work? (eg: should there be a page that clarifies the current situation vs. historical mixed meanings)

… made

node-roles were not intended for internal use and this KEP clarifies both
their use and describes the process for resolving their internal use.
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 19, 2019
@liggitt
Copy link
Member

liggitt commented Aug 19, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 19, 2019
@thockin
Copy link
Member

thockin commented Aug 28, 2019

Thanks!

/lgtm
/approve

@neolit123
Copy link
Member

hm, why isn't @thockin on this list?
https://github.com/kubernetes/enhancements/blob/master/OWNERS_ALIASES#L9-L14

/assign @dims @mattfarina
for approval.

@dims
Copy link
Member

dims commented Aug 28, 2019

/approve
/lgtm

based on @thockin and @liggitt 's ok above

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, liggitt, smarterclayton, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 28, 2019
@k8s-ci-robot k8s-ci-robot merged commit 76bc705 into kubernetes:master Aug 28, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Aug 28, 2019
@smarterclayton smarterclayton deleted the node_role branch April 14, 2020 01:02
errm added a commit to cookpad/terraform-aws-eks that referenced this pull request May 4, 2020
As of 1.16 kubelet refuses to start when setting this label.

See: kubernetes/enhancements#1144 for some
background.

* For now (so the tests pass, I have set an alternate label)
* Perhaps we could just remove the concept of a node-role all together.
* This means that kubectl get nodes doesn't show `ROLES` any more
  * If we want this to work then we would need some other process to
    update the node labels.

#### Service load-balancer

The service load balancer implementation previously implemented a heuristic where `node-role.kubernetes.io/master` is used to exclude masters from the candidate nodes for a service. This is an implementation detail of the cluster and is not allowed. Since there is value in excluding nodes from service load balancer candidacy in some deployments, an alpha feature gated label `alpha.service-controller.kubernetes.io/exclude-balancer` was added in Kubernetes 1.9.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the right place to comment. As I understand from this, the heuristic based on the label node-role.kubernetes.io/master will be removed? In what version will it be removed? Will the heuristic be done based on the node.kubernetes.io/exclude-from-external-load-balancers label in the future? Will this label be added to each master from a certain release to prevent adding master nodes as candidate nodes?

mandre pushed a commit to shiftstack/installer that referenced this pull request Sep 23, 2022
There should no longer be any issues running router pods on control
plane nodes (i.e. kubernetes/kubernetes#65618
which was resolved in
kubernetes/enhancements#1144). Remove this
limitation from the docs.

Signed-off-by: Stephen Finucane <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.