Master is added to NodePort / AWS ELBs #33884

justinsb · 2016-10-01T19:18:48Z

I noticed that the master node is now added to NodePort and to AWS ELBs, now that the master is registered as a Node.

I think we probably don't want to do this, as we don't want to be bouncing traffic through the master, for performance reasons. Also the master doesn't always run kube-proxy.

justinsb · 2016-10-01T19:19:00Z

cc @girishkalele @thockin

justinsb · 2016-10-01T19:22:47Z

Another reason this is a bad idea: this also means we're likely to open up a port on the master

thockin · 2016-10-03T01:33:42Z

@kubernetes/sig-network

bprashanth · 2016-10-03T17:09:30Z

Can you clarify "get added to nodeport"? If the master is running kube-proxy, it will open up the nodeport on the master and proxy requests to endpoints. If you're not, the master nodePort is left closed. If you're listing all nodes and adding external-ip:nodePort to an lb, you should instead list only schedulable nodes. If the master is not marked unschedulable, that's a bug.

justinsb · 2016-10-04T13:43:49Z

Ah, I think this is the problem:

If the master is not marked unschedulable, that's a bug.

The master is not marked unschedulable. But that is correct, because we are supposed to use taints and tolerations instead. We mark it schedulable with a taint so that we can use deployments for our system components.

kops does today where we're (likely) going now that we include the master: it runs kube-proxy, it gets a PodCIDR, it has a taint but is not marked Unschedulable, the master will allow kubectl logs.

I think we both need:

a way to make sure kube-proxy doesn't open NodePort on the master when the master is tainted
a way to exclude the master from the servicecontroller that doesn't rely on Unschedulable, as that is going way

If we agree I can split into two separate issues.

bprashanth · 2016-10-04T22:16:20Z

I guess the point of making the master just another node that you can now run master components on just any other node. If we need to treat it specially, we need some stigma, right? saying every unschedulable node (be it through taint or explicity field) doesn't get nodePort seems wrong.

We are moving to mark masters as tainted, and not necessarily unschedulable. But this means we are now sending general service traffic through the master. Recognize "well-known" master taints and labels, and avoid attaching these nodes to LoadBalancers. Issue kubernetes#33884 (Does not completely resolve the issue, because we also should not even be opening the NodePort)

justinsb · 2016-10-31T14:44:58Z

As I understand the reasoning behind moving from Unschedulable to taints: taints expose greater control. So we can say "these instances can run etcd", and "these instances can run apiserver". And we allow "userspace" to have the same control.

It also means we can run our system components as e.g. deployments; today they are required to bypass the scheduler (i.e. be in /etc/kubernetes/manifests).

(I actually think we should continue to remove a node that has been kubectl cordon node from load balancers, as it is likely undergoing maintenance. I added a note in the code.)

cc @mikedanese and @davidopp as this is more taints stuff (LMK if I should not be cc-ing you on these!)

Sent PR #35902, which will stop us adding new-master nodes to LoadBalancers, but does not address the concern that we shouldn't even be opening the NodePort on a master.

caseydavenport · 2016-10-31T17:00:09Z

I'm not sure I'm convinced yet that we never want to open NodePorts on master nodes (i.e that triggering off of master / non-master is correct).

Seems like there should be a way to say "don't open node-ports on this node", but using the "master" as the switch doesn't feel right. If the old way was to check the unschedulable mark, can we maintain that behavior by looking at the "unchedulable" taint instead?

justinsb · 2016-10-31T18:01:12Z

@caseydavenport we will continue to honor unschedulable - that'll be important for when a node is cordon-ed e.g. for maintenance. #35902 will say that if a node is marked as a dedicated node in the specific key "master" we will avoid it. Similarly we recognize two labels that are applied by installation tools like kops & kubeadm today. I did it this way for consistency with #35901

I don't mind if we don't key off the name of the taint and simply treat any NoSchedule taint as the equivalent of unschedulable, and/or if we ignore the labels. Is that what you're suggesting instead? If not, can you explain what you are suggesting?

davidopp · 2016-10-31T19:58:25Z

Yes, when #29178 is implemented then Unschedulable will be represented as a taint that you can use to decide whether a pod should schedule on the node. (We can also do a special taint just for the master, if that helps.) But this won' t be in 1.5.

davidopp · 2016-10-31T20:01:57Z

Hmm, it looks like #31647 has the intention of having the master node register with a taint, and that PR is already merged. It wasn't clear to me if we are actually using it to have master node register with a taint yet, though. @mikedanese can you clarify?

justinsb · 2016-11-01T05:39:44Z

@davidopp kops already has tainted masters that are otherwise schedulable - the genie is out of the bottle. It's why I'm pushing to fix all these taint issues :-)

dims · 2016-11-16T15:01:42Z

This needs to be triaged as a release-blocker or not for 1.5 @justinsb @davidopp

justinsb · 2016-11-16T15:52:22Z

It makes a lot of people really uncomfortable on AWS, given the way we've made the master the "secure enclave" and a performance contention point, so for now I'll mark as P1.

dims · 2016-11-18T13:08:00Z

@justinsb all issues must be labeled either release blocker or non release blocking by end of day 18 November 2016 PST. (or please move it to 1.6) cc @kubernetes/sig-aws

justinsb · 2016-11-18T21:05:58Z

A lot of discomfort around this; marking as a release-blocker for now.

chrislovecnm · 2016-11-18T22:02:49Z

Do we have a PR in to address this?

dims · 2016-11-23T22:25:42Z

@justinsb : @chrislovecnm : How do we unblock the release? What needs to be done here?

ajohnstone · 2017-02-01T12:31:27Z

Any update on this issue?

greg-jaunt · 2017-03-17T04:16:54Z

I'm hitting the problem described in the original post (master nodes are being used by AWS ELBs for routing service traffic, but they should not be). The discussion here has diverged from the original problem.

In AWS, master nodes will typically be run on instance types with fewer vCPUs than the worker node instance types to save cost. In AWS, nodes with fewer vCPUs typically also have lower available virtual network bandwidth. Routing ELB traffic through both master and worker instances using equal weight round robin will therefore result in performance problems on the master or potentially starve the cluster from being able to access the master when the load is high.

A mechanism for excluding master nodes from ELBs has not been established in this thread. Taints seem like a very complicated way to do this as you would have to look for some very explicit set of taint parameters that would indicate that no external traffic should ever be routed to the node. Tagging could be a more-flexible alternative, however since all worker nodes join the ELB all of the provisioned bandwidth of the worker nodes will already be available to any one ELB. I can't think of a scenario where just always excluding master nodes from ELBs would be problematic.

I can envision problems in a cluster with multiple instance groups, each with nodes of different size. Blindly ELB round-robining across all instances both instance groups could result in saturating the network on the nodes in the instance group with smaller instances.

IMO:

This issue could be considered a production blocker for anyone on AWS with high traffic.
Master nodes should be schedulable so that extra administrative containers can be run on them. Marking master nodes unschedulable is not a good solution.
Simply excluding master nodes from ELBs is a simple solution that should not have any undesirable side-effects and should be implemented as a first step.
A better solution would enable even more control over assignment of nodes to ELBs to ensure all nodes in an ELB are of equal size.

dlsniper · 2017-03-17T10:12:12Z

Based on the comment from @greg-jaunt maybe another solution could be to tag the node if it should be excluded from the load balancer for a certain service? That way you could potentially take a node out of the LB for a certain service but keep it for another service if so desired.

greg-jaunt · 2017-03-20T23:43:04Z

Tags would be the most flexible. The other alternative that comes to mind is to be able to specify an instance group name in the Service config, since I think nodes in an instance group would always all be the same instance type. That's less flexible, but foolproof.

An even better solution, IMO, would be to determine what nodes the service is actually running on and only add those nodes to the ELB.

quentez · 2017-04-05T21:38:20Z

The solution of allowing services to have node-selector labels (or something similar) to decide what nodes to add to the ELB would be a great solution to this. We're using different instance groups for different workloads, and they're not all equals when it comes to proxying web traffic.

rushtehrani · 2017-04-20T21:24:23Z

Having something similar to nodeSelector on services makes sense, especially if they are LoadBalancer services. For example, we have an instance group for GlusterFS nodes and those nodes should not be added to ELB.

justinsb · 2017-04-21T01:54:04Z

I have checked, and the servicecontroller documents that masters are excluded, but that the only way a master node can be recognized is via the unschedulable field (at the time). There's even a TODO to do something better than assume that unschedulable => master.

As such I have submitted a PR to fix this, which is a regression for when we are using the new 1.6 taints and the new 1.6 labels for our master. The servicecontroller now recognizes 1.6 master nodes and will exclude them, returning the documented behaviour :-)

The servicecontroller documents that the master is excluded from the LoadBalancer / NodePort, but this is broken for clusters where we are using taints for the master (as introduced in 1.6), instead of marking the master as unschedulable. This restores the desired documented behaviour, by excluding nodes that are labeled as masters with the new 1.6 labels, even if they use the new 1.6 taints. Fix kubernetes#33884

Automatic merge from submit-queue Exclude master from LoadBalancer / NodePort The servicecontroller documents that the master is excluded from the LoadBalancer / NodePort, but this is broken for clusters where we are using taints for the master (as introduced in 1.6), instead of marking the master as unschedulable. This restores the desired documented behaviour, by excluding nodes that are labeled as masters with the new 1.6 labels, even if they use the new 1.6 taints. Fix #33884 ```release-note Exclude nodes labeled as master from LoadBalancer / NodePort; restores documented behaviour ```

The servicecontroller documents that the master is excluded from the LoadBalancer / NodePort, but this is broken for clusters where we are using taints for the master (as introduced in 1.6), instead of marking the master as unschedulable. This restores the desired documented behaviour, by excluding nodes that are labeled as masters with the new 1.6 labels, even if they use the new 1.6 taints. Fix kubernetes#33884

renewooller · 2017-06-12T01:02:26Z

Are we sure this is closed? On 1.6.4 (C) 1.6.2 (S) I've noticed the master node being added to LoadBalancers

renewooller · 2017-06-12T01:14:28Z

and kubernetes/kops#853

justinsb · 2017-06-12T01:30:46Z

#44745 added 82d600b which is on the release-1.7 branch (as of v1.7.0-alpha.3), and the backport #44926 added f542fe1 which is on the 1.6 branch (as of 1.6.3).

But this could also be #45773 .

In this case though @renewooller , the fix is not in server 1.6.2.

renewooller · 2017-06-12T02:02:45Z

Thanks for the prompt clarification @justinsb

chrisricci · 2017-07-25T02:28:47Z

It seems as though the fix only applies to when nodes are added or removed from the cluster, in which case the nodes in the LB pool are recalculated and only the workers are added. When creating a new LoadBalancer service, masters nodes are still being added.

justinsb · 2017-07-25T02:34:56Z

@cricci82 which k8s version? #45773 should have harmonized the two cases

chrisricci · 2017-07-25T03:24:57Z

I'm running 1.6.7

justinsb · 2017-07-25T03:41:04Z

It wasn't backported to 1.6, it looks like: 976e046

chrisricci · 2017-07-25T12:19:25Z

That explains it, thank you!

ljani · 2018-06-28T18:58:29Z

I'm running a single node cluster on AWS EC2, ie. scheduling work on the master as well. Now that I tried to add an ELB for my service, the EC2 instance is not associated with the ELB. The reason seems to be this. Is my scenario a supported one? @greg-jaunt what would be the second step you were implying? :-)

As a workaround, I wonder what would happend if I removed the node-role.kubernetes.io/master label.

The error message is:

I0628 17:54:48.853175       1 event.go:221] Event(v1.ObjectReference{Kind:"Service", Namespace:"default", Name:"nginx", UID:"59980933-7afc-11e8-9a8c-0621b993d602", APIVersion:"v1", ResourceVersion:"2052", FieldPath:""}): type: 'Warning' reason: 'UnAvailableLoadBalancer' There are no available nodes for LoadBalancer service default/nginx

EDIT: Imma create a new issue for this

smarterclayton · 2018-08-21T18:55:47Z

I'm not sure this is correct as we run more things on the master. If I have a service that I want to expose like an API proxy, an aggregated API server that wants its own LB, or a not quite control plane but not quite end user workload. We shouldn't bias towards the master (I.e. a lot of node port services for regular workloads should bypass the master), but this PR as it stands prevents using service load balancers with services that run on masters, which limits options like self hosting and use.

Now that node load balancing is more nuanced (with health checking endpoints), do we even need this? Masters should not be marked as healthy when pods aren't run on them, which means no traffic would go to masters?

k8s-github-robot added area/nodecontroller team/cluster labels Oct 1, 2016

thockin modified the milestones: rktnetes-v1.1, v1.5 Oct 3, 2016

justinsb mentioned this issue Oct 13, 2016

Master doesn't put to SchedulingDisabled mode kubernetes/kops#639

Closed

justinsb self-assigned this Oct 31, 2016

justinsb mentioned this issue Oct 31, 2016

Don't send service LoadBalancer traffic through master nodes #35902

Closed

shamil mentioned this issue Nov 9, 2016

K8s services exposed as a ELB include the master nodes. kubernetes/kops#853

Closed

justinsb added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Nov 16, 2016

justinsb added the release-blocker label Nov 18, 2016

jgmize mentioned this issue Apr 6, 2017

Services of type LoadBalancer include master nodes in ELB instances mozmeao/infra#153

Closed

justinsb mentioned this issue Apr 21, 2017

Exclude master from LoadBalancer / NodePort #44745

Merged

k8s-github-robot closed this as completed in #44745 Apr 24, 2017

jkemp101 mentioned this issue May 2, 2017

Improve service's node selection for load balancers #45234

Closed

dshmelev added a commit to dshmelev/aws_kube_tc that referenced this issue Aug 3, 2017

Update to 1.7.3: fix kubernetes/kubernetes#33884

eb235bb

ljani mentioned this issue Jun 29, 2018

Masters should not be excluded from service load balancers #65618

Closed

Master is added to NodePort / AWS ELBs #33884

Master is added to NodePort / AWS ELBs #33884

Comments

justinsb commented Oct 1, 2016

justinsb commented Oct 1, 2016

justinsb commented Oct 1, 2016

thockin commented Oct 3, 2016

bprashanth commented Oct 3, 2016

justinsb commented Oct 4, 2016

bprashanth commented Oct 4, 2016

justinsb commented Oct 31, 2016

caseydavenport commented Oct 31, 2016

justinsb commented Oct 31, 2016

davidopp commented Oct 31, 2016

davidopp commented Oct 31, 2016

justinsb commented Nov 1, 2016

dims commented Nov 16, 2016

justinsb commented Nov 16, 2016 • edited Loading

dims commented Nov 18, 2016

justinsb commented Nov 18, 2016

chrislovecnm commented Nov 18, 2016

dims commented Nov 23, 2016

ajohnstone commented Feb 1, 2017

greg-jaunt commented Mar 17, 2017

dlsniper commented Mar 17, 2017

greg-jaunt commented Mar 20, 2017

quentez commented Apr 5, 2017 • edited Loading

rushtehrani commented Apr 20, 2017

justinsb commented Apr 21, 2017 • edited Loading

renewooller commented Jun 12, 2017 • edited Loading

renewooller commented Jun 12, 2017

justinsb commented Jun 12, 2017

renewooller commented Jun 12, 2017

chrisricci commented Jul 25, 2017 • edited Loading

justinsb commented Jul 25, 2017

chrisricci commented Jul 25, 2017

justinsb commented Jul 25, 2017

chrisricci commented Jul 25, 2017

ljani commented Jun 28, 2018 • edited Loading

smarterclayton commented Aug 21, 2018

justinsb commented Nov 16, 2016 •

edited

Loading

quentez commented Apr 5, 2017 •

edited

Loading

justinsb commented Apr 21, 2017 •

edited

Loading

renewooller commented Jun 12, 2017 •

edited

Loading

chrisricci commented Jul 25, 2017 •

edited

Loading

ljani commented Jun 28, 2018 •

edited

Loading