Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master is added to NodePort / AWS ELBs #33884

Closed
justinsb opened this issue Oct 1, 2016 · 56 comments
Closed

Master is added to NodePort / AWS ELBs #33884

justinsb opened this issue Oct 1, 2016 · 56 comments
Assignees
Labels
area/nodecontroller priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@justinsb
Copy link
Member

justinsb commented Oct 1, 2016

I noticed that the master node is now added to NodePort and to AWS ELBs, now that the master is registered as a Node.

I think we probably don't want to do this, as we don't want to be bouncing traffic through the master, for performance reasons. Also the master doesn't always run kube-proxy.

@justinsb
Copy link
Member Author

justinsb commented Oct 1, 2016

cc @girishkalele @thockin

@justinsb
Copy link
Member Author

justinsb commented Oct 1, 2016

Another reason this is a bad idea: this also means we're likely to open up a port on the master

@thockin thockin modified the milestones: rktnetes-v1.1, v1.5 Oct 3, 2016
@thockin
Copy link
Member

thockin commented Oct 3, 2016

@kubernetes/sig-network

@bprashanth
Copy link
Contributor

Can you clarify "get added to nodeport"? If the master is running kube-proxy, it will open up the nodeport on the master and proxy requests to endpoints. If you're not, the master nodePort is left closed. If you're listing all nodes and adding external-ip:nodePort to an lb, you should instead list only schedulable nodes. If the master is not marked unschedulable, that's a bug.

@justinsb
Copy link
Member Author

justinsb commented Oct 4, 2016

Ah, I think this is the problem:

If the master is not marked unschedulable, that's a bug.

The master is not marked unschedulable. But that is correct, because we are supposed to use taints and tolerations instead. We mark it schedulable with a taint so that we can use deployments for our system components.

kops does today where we're (likely) going now that we include the master: it runs kube-proxy, it gets a PodCIDR, it has a taint but is not marked Unschedulable, the master will allow kubectl logs.

I think we both need:

  1. a way to make sure kube-proxy doesn't open NodePort on the master when the master is tainted
  2. a way to exclude the master from the servicecontroller that doesn't rely on Unschedulable, as that is going way

If we agree I can split into two separate issues.

@bprashanth
Copy link
Contributor

I guess the point of making the master just another node that you can now run master components on just any other node. If we need to treat it specially, we need some stigma, right? saying every unschedulable node (be it through taint or explicity field) doesn't get nodePort seems wrong.

@justinsb justinsb self-assigned this Oct 31, 2016
justinsb added a commit to justinsb/kubernetes that referenced this issue Oct 31, 2016
We are moving to mark masters as tainted, and not necessarily
unschedulable.  But this means we are now sending general service
traffic through the master.

Recognize "well-known" master taints and labels, and avoid attaching
these nodes to LoadBalancers.

Issue kubernetes#33884

(Does not completely resolve the issue, because we also should not even
be opening the NodePort)
@justinsb
Copy link
Member Author

As I understand the reasoning behind moving from Unschedulable to taints: taints expose greater control. So we can say "these instances can run etcd", and "these instances can run apiserver". And we allow "userspace" to have the same control.

It also means we can run our system components as e.g. deployments; today they are required to bypass the scheduler (i.e. be in /etc/kubernetes/manifests).

(I actually think we should continue to remove a node that has been kubectl cordon node from load balancers, as it is likely undergoing maintenance. I added a note in the code.)

cc @mikedanese and @davidopp as this is more taints stuff (LMK if I should not be cc-ing you on these!)

Sent PR #35902, which will stop us adding new-master nodes to LoadBalancers, but does not address the concern that we shouldn't even be opening the NodePort on a master.

@caseydavenport
Copy link
Member

I'm not sure I'm convinced yet that we never want to open NodePorts on master nodes (i.e that triggering off of master / non-master is correct).

Seems like there should be a way to say "don't open node-ports on this node", but using the "master" as the switch doesn't feel right. If the old way was to check the unschedulable mark, can we maintain that behavior by looking at the "unchedulable" taint instead?

@justinsb
Copy link
Member Author

@caseydavenport we will continue to honor unschedulable - that'll be important for when a node is cordon-ed e.g. for maintenance. #35902 will say that if a node is marked as a dedicated node in the specific key "master" we will avoid it. Similarly we recognize two labels that are applied by installation tools like kops & kubeadm today. I did it this way for consistency with #35901

I don't mind if we don't key off the name of the taint and simply treat any NoSchedule taint as the equivalent of unschedulable, and/or if we ignore the labels. Is that what you're suggesting instead? If not, can you explain what you are suggesting?

@davidopp
Copy link
Member

Yes, when #29178 is implemented then Unschedulable will be represented as a taint that you can use to decide whether a pod should schedule on the node. (We can also do a special taint just for the master, if that helps.) But this won' t be in 1.5.

@davidopp
Copy link
Member

Hmm, it looks like #31647 has the intention of having the master node register with a taint, and that PR is already merged. It wasn't clear to me if we are actually using it to have master node register with a taint yet, though. @mikedanese can you clarify?

@justinsb
Copy link
Member Author

justinsb commented Nov 1, 2016

@davidopp kops already has tainted masters that are otherwise schedulable - the genie is out of the bottle. It's why I'm pushing to fix all these taint issues :-)

@dims
Copy link
Member

dims commented Nov 16, 2016

This needs to be triaged as a release-blocker or not for 1.5 @justinsb @davidopp

@justinsb justinsb added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Nov 16, 2016
@justinsb
Copy link
Member Author

justinsb commented Nov 16, 2016

It makes a lot of people really uncomfortable on AWS, given the way we've made the master the "secure enclave" and a performance contention point, so for now I'll mark as P1.

@dims
Copy link
Member

dims commented Nov 18, 2016

@justinsb all issues must be labeled either release blocker or non release blocking by end of day 18 November 2016 PST. (or please move it to 1.6) cc @kubernetes/sig-aws

@justinsb
Copy link
Member Author

A lot of discomfort around this; marking as a release-blocker for now.

@chrislovecnm
Copy link
Contributor

Do we have a PR in to address this?

@dims
Copy link
Member

dims commented Nov 23, 2016

@justinsb : @chrislovecnm : How do we unblock the release? What needs to be done here?

@ajohnstone
Copy link
Contributor

Any update on this issue?

@greg-jaunt
Copy link

I'm hitting the problem described in the original post (master nodes are being used by AWS ELBs for routing service traffic, but they should not be). The discussion here has diverged from the original problem.

In AWS, master nodes will typically be run on instance types with fewer vCPUs than the worker node instance types to save cost. In AWS, nodes with fewer vCPUs typically also have lower available virtual network bandwidth. Routing ELB traffic through both master and worker instances using equal weight round robin will therefore result in performance problems on the master or potentially starve the cluster from being able to access the master when the load is high.

A mechanism for excluding master nodes from ELBs has not been established in this thread. Taints seem like a very complicated way to do this as you would have to look for some very explicit set of taint parameters that would indicate that no external traffic should ever be routed to the node. Tagging could be a more-flexible alternative, however since all worker nodes join the ELB all of the provisioned bandwidth of the worker nodes will already be available to any one ELB. I can't think of a scenario where just always excluding master nodes from ELBs would be problematic.

I can envision problems in a cluster with multiple instance groups, each with nodes of different size. Blindly ELB round-robining across all instances both instance groups could result in saturating the network on the nodes in the instance group with smaller instances.

IMO:

  • This issue could be considered a production blocker for anyone on AWS with high traffic.
  • Master nodes should be schedulable so that extra administrative containers can be run on them. Marking master nodes unschedulable is not a good solution.
  • Simply excluding master nodes from ELBs is a simple solution that should not have any undesirable side-effects and should be implemented as a first step.
  • A better solution would enable even more control over assignment of nodes to ELBs to ensure all nodes in an ELB are of equal size.

@dlsniper
Copy link

Based on the comment from @greg-jaunt maybe another solution could be to tag the node if it should be excluded from the load balancer for a certain service? That way you could potentially take a node out of the LB for a certain service but keep it for another service if so desired.

@greg-jaunt
Copy link

Tags would be the most flexible. The other alternative that comes to mind is to be able to specify an instance group name in the Service config, since I think nodes in an instance group would always all be the same instance type. That's less flexible, but foolproof.

An even better solution, IMO, would be to determine what nodes the service is actually running on and only add those nodes to the ELB.

@quentez
Copy link

quentez commented Apr 5, 2017

The solution of allowing services to have node-selector labels (or something similar) to decide what nodes to add to the ELB would be a great solution to this. We're using different instance groups for different workloads, and they're not all equals when it comes to proxying web traffic.

@rushtehrani
Copy link

Having something similar to nodeSelector on services makes sense, especially if they are LoadBalancer services. For example, we have an instance group for GlusterFS nodes and those nodes should not be added to ELB.

@justinsb
Copy link
Member Author

justinsb commented Apr 21, 2017

I have checked, and the servicecontroller documents that masters are excluded, but that the only way a master node can be recognized is via the unschedulable field (at the time). There's even a TODO to do something better than assume that unschedulable => master.

As such I have submitted a PR to fix this, which is a regression for when we are using the new 1.6 taints and the new 1.6 labels for our master. The servicecontroller now recognizes 1.6 master nodes and will exclude them, returning the documented behaviour :-)

justinsb added a commit to justinsb/kubernetes that referenced this issue Apr 21, 2017
The servicecontroller documents that the master is excluded from the
LoadBalancer / NodePort, but this is broken for clusters where we are
using taints for the master (as introduced in 1.6), instead of marking
the master as unschedulable.

This restores the desired documented behaviour, by excluding nodes that
are labeled as masters with the new 1.6 labels, even if they use the new
1.6 taints.

Fix kubernetes#33884
k8s-github-robot pushed a commit that referenced this issue Apr 24, 2017
Automatic merge from submit-queue

Exclude master from LoadBalancer / NodePort

The servicecontroller documents that the master is excluded from the
LoadBalancer / NodePort, but this is broken for clusters where we are
using taints for the master (as introduced in 1.6), instead of marking
the master as unschedulable.

This restores the desired documented behaviour, by excluding nodes that
are labeled as masters with the new 1.6 labels, even if they use the new
1.6 taints.

Fix #33884

```release-note
Exclude nodes labeled as master from LoadBalancer / NodePort; restores documented behaviour
```
jayunit100 pushed a commit to jayunit100/kubernetes that referenced this issue Apr 25, 2017
The servicecontroller documents that the master is excluded from the
LoadBalancer / NodePort, but this is broken for clusters where we are
using taints for the master (as introduced in 1.6), instead of marking
the master as unschedulable.

This restores the desired documented behaviour, by excluding nodes that
are labeled as masters with the new 1.6 labels, even if they use the new
1.6 taints.

Fix kubernetes#33884
justinsb added a commit to justinsb/kubernetes that referenced this issue Apr 25, 2017
The servicecontroller documents that the master is excluded from the
LoadBalancer / NodePort, but this is broken for clusters where we are
using taints for the master (as introduced in 1.6), instead of marking
the master as unschedulable.

This restores the desired documented behaviour, by excluding nodes that
are labeled as masters with the new 1.6 labels, even if they use the new
1.6 taints.

Fix kubernetes#33884
@renewooller
Copy link

renewooller commented Jun 12, 2017

Are we sure this is closed? On 1.6.4 (C) 1.6.2 (S) I've noticed the master node being added to LoadBalancers

@renewooller
Copy link

and kubernetes/kops#853

@justinsb
Copy link
Member Author

#44745 added 82d600b which is on the release-1.7 branch (as of v1.7.0-alpha.3), and the backport #44926 added f542fe1 which is on the 1.6 branch (as of 1.6.3).

But this could also be #45773 .

In this case though @renewooller , the fix is not in server 1.6.2.

@renewooller
Copy link

Thanks for the prompt clarification @justinsb

@chrisricci
Copy link

chrisricci commented Jul 25, 2017

It seems as though the fix only applies to when nodes are added or removed from the cluster, in which case the nodes in the LB pool are recalculated and only the workers are added. When creating a new LoadBalancer service, masters nodes are still being added.

@justinsb
Copy link
Member Author

@cricci82 which k8s version? #45773 should have harmonized the two cases

@chrisricci
Copy link

I'm running 1.6.7

@justinsb
Copy link
Member Author

It wasn't backported to 1.6, it looks like: 976e046

@chrisricci
Copy link

That explains it, thank you!

dshmelev added a commit to dshmelev/aws_kube_tc that referenced this issue Aug 3, 2017
@ljani
Copy link

ljani commented Jun 28, 2018

I'm running a single node cluster on AWS EC2, ie. scheduling work on the master as well. Now that I tried to add an ELB for my service, the EC2 instance is not associated with the ELB. The reason seems to be this. Is my scenario a supported one? @greg-jaunt what would be the second step you were implying? :-)

As a workaround, I wonder what would happend if I removed the node-role.kubernetes.io/master label.

The error message is:

I0628 17:54:48.853175       1 event.go:221] Event(v1.ObjectReference{Kind:"Service", Namespace:"default", Name:"nginx", UID:"59980933-7afc-11e8-9a8c-0621b993d602", APIVersion:"v1", ResourceVersion:"2052", FieldPath:""}): type: 'Warning' reason: 'UnAvailableLoadBalancer' There are no available nodes for LoadBalancer service default/nginx

EDIT: Imma create a new issue for this

@smarterclayton
Copy link
Contributor

I'm not sure this is correct as we run more things on the master. If I have a service that I want to expose like an API proxy, an aggregated API server that wants its own LB, or a not quite control plane but not quite end user workload. We shouldn't bias towards the master (I.e. a lot of node port services for regular workloads should bypass the master), but this PR as it stands prevents using service load balancers with services that run on masters, which limits options like self hosting and use.

Now that node load balancing is more nuanced (with health checking endpoints), do we even need this? Masters should not be marked as healthy when pods aren't run on them, which means no traffic would go to masters?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/nodecontroller priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests