-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe appropriate use of node role labels and fixes that should be made #1144
Conversation
4d63da2
to
06fd04d
Compare
06fd04d
to
71f34d8
Compare
71f34d8
to
bf1dd31
Compare
110d83d
to
4252b01
Compare
|
||
* Kubernetes components MUST NOT set or alter behavior on any label within the `node-role.kubernetes.io/*` namespace. | ||
* Kubernetes components (such as `kubectl`) MAY simplify the display of `node-role.kubernetes.io/*` labels to convey the node roles of a node | ||
* Kubernetes examples and documentation MUST NOT leverage the node-role labels for node placement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the big one that has historically been the problem: how can a workload be targeted only to the control plane nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess apart from a node-role
label, a CP node would also need a regular label, that a nodeSelector can then target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not specified and is implementation specific.
There is no such thing as a "control plane node" defined in Kube, and I think targeting user workloads to those nodes is completely outside the scope of Kube. Anyone who wants to do that needs to accept a label selector at a generic level.
I'll add that to the KEP, but I don't see this KEP as trying to solve that problem or any appetite to solve it, because control plane is vague and undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feeling is that we don't have a concept of "control plane nodes". Once you register a node of any sort, you have the "standard" mechanisms for workload inclusion/exclusion.
My question is where the line falls. Is it OK for a tool (think kubeadm) to have a "mode" which dedicates some nodes as control-plane? Should it accept an arbitrary label name to use instead of assuming this magic label? Is the label actually a sufficient primitive?
|
||
#### Node controller excludes master nodes from consideration for eviction | ||
|
||
The `k8s.io/kubernetes/pkg/util/system/IsMasterNode(nodeName)` function is used by the NodeLifecycleController to exclude nodes with a node name that ends in `master` or starts with `master-` when considering whether to mark nodes as disrupted. A recent PR attempted to change this to use node-roles and was blocked. Instead, the controller should be updated to use a label `node.kubernetes.io/exclude-disruption` to decide whether to exclude nodes from being considered for disruption handling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Can we enable this one right away? Do we need to feature-flag it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything new needs to go in as alpha gated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Describing the intent rather than implying an intent from a role is vastly preferred.
fyi @dchen1107 as this is node controller.
|
||
#### Kubernetes e2e tests | ||
|
||
The e2e tests use a number of heuristics including the `IsMasterNode(nodeName)` function and the node-roles labels to select nodes. In order for conformant Kubernetes clusters to run the tests, the e2e suite must change to use individual user-provided label selectors to identify nodes to test, nodes that have special rules for testing unusual cases, and for other selection behaviors. The label selectors may be defaulted by the test code to their current values, as long as a conformant cluster operator can execute the e2e suite against an arbitrary cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should clearly differentiate between e2e tests that are part of the conformance suite, and the idea that a conformant cluster should be able to pass the whole e2e suite. It is absolutely not the case that a conformant cluster should be able to pass the whole suite (sadly), otherwise why mark tests for conformance. We have coverage of some pretty specific scenarios in our e2e suite, and it's going to be tough to replace this.
Unless what you're saying is we replace
const MasterNodeLabel = "node-role.kubernetes.io/master"
with
var MasterNodeLabel = flag.String("master-node-label", "node-role.kubernetes.io/master", "the label which identifies master nodes")
That seems reasonable, as does the idea that we should avoid tests requiring that the flag be a certain value, but I do think it's reasonable to have a conformance test that e.g. "pods keep running when the control plane never responds to requests", where labeling these nodes is likely the most practical way to implement this. Increasing e2e coverage feels more important than the ideological purity of the non-conformance tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The e2e tests need to not encode deployment assumptions. The assumptions that use node label today are almost all "wrong" in the sense that they assume deployment topology. e2e tests MUST NOT assume things like "pods in this role thus have open ports like this". The key distinction here is if an e2e test needs to read a particular port from a particular service, if that port is not part of core Kubernetes the e2e test needs to take it as a parameter. It might have a default for that parameter, but it's not acceptable for an e2e test to only work on one particular type of GCE test (i.e. kind should be able to run e2e tests if the apiserver isn't listening on 6443).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd draw the distinction between the e2e tests that are conformance vs non-conformance. Should we have configuration for these things and perfect tests that are ideologically pure, absolutely. But for a non-conformance test, I'd say coverage is more important - sometimes e2e tests need to do bad things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I understand that, but I still would rather just fix the e2e tests in general since conformance is an arbitrary subset anyway. This isn't really a huge problem.
We should also address the primary use case for these labels: how can a manifest target a pod to only run on a master node? That's the reason why they are so ubiquitous; their use inside k/k feels almost incidental. |
This cannot be defined by Kubernetes because Kubernetes does not require a particular deployment topology. But certainly a large set of deployers can use |
I think it's fine to promote the service-exclusion label to GA, and to create another label that controls eviction, etc. Those are wonderful changes that users will appreciate. But... users today need to steer workloads to run on the master. I would love it if we had a better answer on identity & isolation, but let's wait to tackle "node-role" until we are ready to make that case. Propose that you rename this KEP to be "promote service-exclusion labels to GA and introduce node.kubernetes.io/exclude-disruption", so that it is technical and not political/philosophical. |
By definition we cannot give all users that guarantee. Can you describe in more detail what your use was is? Anyone who tries to schedule onto any given topology has no guarantee that's even possible within a cluster (like GKE), so is this "I want to schedule on masters if they are available but something else if they aren't?" |
The use case seems to be a cluster component that wants either isolation or security. Security can be in case of breakout, or to reuse IAM roles or similar. Before all the installation tooling agreed on these labels, the producers of these manifests had to produce different manifests depending on the installation tooling in use. This is why we standardized it across the OSS kubernetes tooling. Yes, it doesn't work on GKE / EKS or other kubernetes that host the control plane, but usually these cluster addons require special permissions anyway and thus aren't user installable there anyway. Let's either expand this KEP so that it addresses the big picture - I'm happy to help you do so - or let's focus it on what it is doing technically, lest it suck the joy out of actually fixing node-roles. |
We did not do this or require this. A number of installers chose to do that. It is not required, nor standard. I'm really confused what you mean by "fix node-roles" now. Maybe we should have a chat next week and dig in. |
I partialy lack the historic context but if a number of deployers requsted
the same label to indicate that a node hosts a control plane in a common,
user visible way, i can see how this could have resulted in the unofficial
node-role/master label.
I think the deployer level standardization can be a good thing, even if the
core is agnostic.
…On Jul 19, 2019 22:19, "Clayton Coleman" ***@***.***> wrote:
Before all the installation tooling agreed on these labels, the producers
of these manifests had to produce different manifests depending on the
installation tooling in use. This is why we standardized it across the OSS
kubernetes tooling.
We did not do this or require this. A number of installers chose to do
that. It is not required, nor standard.
I'm really confused what you mean by "fix node-roles" now. Maybe we should
have a chat next week and dig in.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1144?email_source=notifications&email_token=AACRATDO54UYJCMLFCIO4QDQAIHSLA5CNFSM4IEJNATKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2MQS4I#issuecomment-513345905>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACRATGJXGCYQ4MOCOB4TZTQAIHSLANCNFSM4IEJNATA>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with removing the concept of a node role in favor of labels that have more specific intents.
I would like to understand how we can enforce this concept across cloud providers as well given the large amount of innovation in hosting models across the ecosystem which depend on those components that were in-tree and now are out of tree.
|
||
#### Node controller excludes master nodes from consideration for eviction | ||
|
||
The `k8s.io/kubernetes/pkg/util/system/IsMasterNode(nodeName)` function is used by the NodeLifecycleController to exclude nodes with a node name that ends in `master` or starts with `master-` when considering whether to mark nodes as disrupted. A recent PR attempted to change this to use node-roles and was blocked. Instead, the controller should be updated to use a label `node.kubernetes.io/exclude-disruption` to decide whether to exclude nodes from being considered for disruption handling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Describing the intent rather than implying an intent from a role is vastly preferred.
fyi @dchen1107 as this is node controller.
QUESTION: Is a single label selector sufficient to identify nodes to test? | ||
|
||
|
||
#### Preventing accidental reintroduction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we clarify how this should work with cloud provider code? for example, i assume cloud provider (in-tree or out) by convention must not make assumptions on deployment topology? how can we check for that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a brief paragraph, but cloud providers are still part of the core K8s rule set because they are owned by a sig and required to maintain conformance on those clouds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this distinct from kubeadm exactly?
- owned by a sig
- maintaining conformance
@dchen1107 lets ensure there is no disagreement in next SIG node. /assign |
@freehan pointed out that Ingress controllers also use this and we should think of them. He will follow up with thoughts. |
By "this", do you mean role labels or something else? If you meant role labels, Ingress controllers that rely on role labels today are already dependent on particular deployments, since not all set those labels. Note that this doesn't propose removing anything that is currently applying the role labels, so it wouldn't make ingress controllers more broken than they are today. |
The eventual removal of node-role labels might break Ingress controllers
(or clusters), though, so we should evaluate and do outreach.
…On Thu, Aug 8, 2019 at 4:58 PM Jordan Liggitt ***@***.***> wrote:
@freehan <https://github.com/freehan> pointed out that Ingress
controllers also use this and we should think of them. He will follow up
with thoughts
By "this", do you mean role labels or something else?
If you meant role labels, Ingress controllers that rely on role labels
today are already dependent on particular deployments, since not all set
those labels. Note that this doesn't propose removing anything that is
currently applying the role labels, so it wouldn't make ingress controllers
more broken than they are today.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#1144?email_source=notifications&email_token=ABKWAVA5PLN3LL23B2GTXFLQDSXMBA5CNFSM4IEJNATKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD35G3IQ#issuecomment-519728546>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABKWAVHRT6CTYDG3NU2WSF3QDSXMBANCNFSM4IEJNATA>
.
|
@liggitt The service controller has this: https://github.com/kubernetes/kubernetes/blob/release-1.15/pkg/controller/service/service_controller.go#L602 |
Is there a role for SIG Docs in this work? (eg: should there be a page that clarifies the current situation vs. historical mixed meanings) |
… made node-roles were not intended for internal use and this KEP clarifies both their use and describes the process for resolving their internal use.
e93419b
to
3ecd464
Compare
/lgtm |
Thanks! /lgtm |
hm, why isn't @thockin on this list? /assign @dims @mattfarina |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dims, liggitt, smarterclayton, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
As of 1.16 kubelet refuses to start when setting this label. See: kubernetes/enhancements#1144 for some background. * For now (so the tests pass, I have set an alternate label) * Perhaps we could just remove the concept of a node-role all together. * This means that kubectl get nodes doesn't show `ROLES` any more * If we want this to work then we would need some other process to update the node labels.
|
||
#### Service load-balancer | ||
|
||
The service load balancer implementation previously implemented a heuristic where `node-role.kubernetes.io/master` is used to exclude masters from the candidate nodes for a service. This is an implementation detail of the cluster and is not allowed. Since there is value in excluding nodes from service load balancer candidacy in some deployments, an alpha feature gated label `alpha.service-controller.kubernetes.io/exclude-balancer` was added in Kubernetes 1.9. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the right place to comment. As I understand from this, the heuristic based on the label node-role.kubernetes.io/master
will be removed? In what version will it be removed? Will the heuristic be done based on the node.kubernetes.io/exclude-from-external-load-balancers
label in the future? Will this label be added to each master from a certain release to prevent adding master nodes as candidate nodes?
There should no longer be any issues running router pods on control plane nodes (i.e. kubernetes/kubernetes#65618 which was resolved in kubernetes/enhancements#1144). Remove this limitation from the docs. Signed-off-by: Stephen Finucane <[email protected]>
node-roles were not intended for internal use and this KEP clarifies both
their use and describes the process for resolving their internal use.