-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Master is added to NodePort / AWS ELBs #33884
Comments
Another reason this is a bad idea: this also means we're likely to open up a port on the master |
@kubernetes/sig-network |
Can you clarify "get added to nodeport"? If the master is running kube-proxy, it will open up the nodeport on the master and proxy requests to endpoints. If you're not, the master nodePort is left closed. If you're listing all nodes and adding external-ip:nodePort to an lb, you should instead list only schedulable nodes. If the master is not marked unschedulable, that's a bug. |
Ah, I think this is the problem:
The master is not marked unschedulable. But that is correct, because we are supposed to use taints and tolerations instead. We mark it schedulable with a taint so that we can use deployments for our system components. kops does today where we're (likely) going now that we include the master: it runs kube-proxy, it gets a PodCIDR, it has a taint but is not marked Unschedulable, the master will allow I think we both need:
If we agree I can split into two separate issues. |
I guess the point of making the master just another node that you can now run master components on just any other node. If we need to treat it specially, we need some stigma, right? saying every unschedulable node (be it through taint or explicity field) doesn't get nodePort seems wrong. |
We are moving to mark masters as tainted, and not necessarily unschedulable. But this means we are now sending general service traffic through the master. Recognize "well-known" master taints and labels, and avoid attaching these nodes to LoadBalancers. Issue kubernetes#33884 (Does not completely resolve the issue, because we also should not even be opening the NodePort)
As I understand the reasoning behind moving from Unschedulable to taints: taints expose greater control. So we can say "these instances can run etcd", and "these instances can run apiserver". And we allow "userspace" to have the same control. It also means we can run our system components as e.g. deployments; today they are required to bypass the scheduler (i.e. be in /etc/kubernetes/manifests). (I actually think we should continue to remove a node that has been cc @mikedanese and @davidopp as this is more taints stuff (LMK if I should not be cc-ing you on these!) Sent PR #35902, which will stop us adding new-master nodes to LoadBalancers, but does not address the concern that we shouldn't even be opening the NodePort on a master. |
I'm not sure I'm convinced yet that we never want to open NodePorts on master nodes (i.e that triggering off of master / non-master is correct). Seems like there should be a way to say "don't open node-ports on this node", but using the "master" as the switch doesn't feel right. If the old way was to check the unschedulable mark, can we maintain that behavior by looking at the "unchedulable" taint instead? |
@caseydavenport we will continue to honor unschedulable - that'll be important for when a node is cordon-ed e.g. for maintenance. #35902 will say that if a node is marked as a dedicated node in the specific key "master" we will avoid it. Similarly we recognize two labels that are applied by installation tools like kops & kubeadm today. I did it this way for consistency with #35901 I don't mind if we don't key off the name of the taint and simply treat any NoSchedule taint as the equivalent of unschedulable, and/or if we ignore the labels. Is that what you're suggesting instead? If not, can you explain what you are suggesting? |
Yes, when #29178 is implemented then Unschedulable will be represented as a taint that you can use to decide whether a pod should schedule on the node. (We can also do a special taint just for the master, if that helps.) But this won' t be in 1.5. |
Hmm, it looks like #31647 has the intention of having the master node register with a taint, and that PR is already merged. It wasn't clear to me if we are actually using it to have master node register with a taint yet, though. @mikedanese can you clarify? |
@davidopp kops already has tainted masters that are otherwise schedulable - the genie is out of the bottle. It's why I'm pushing to fix all these taint issues :-) |
It makes a lot of people really uncomfortable on AWS, given the way we've made the master the "secure enclave" and a performance contention point, so for now I'll mark as P1. |
@justinsb all issues must be labeled either release blocker or non release blocking by end of day 18 November 2016 PST. (or please move it to 1.6) cc @kubernetes/sig-aws |
A lot of discomfort around this; marking as a release-blocker for now. |
Do we have a PR in to address this? |
@justinsb : @chrislovecnm : How do we unblock the release? What needs to be done here? |
Any update on this issue? |
I'm hitting the problem described in the original post (master nodes are being used by AWS ELBs for routing service traffic, but they should not be). The discussion here has diverged from the original problem. In AWS, master nodes will typically be run on instance types with fewer vCPUs than the worker node instance types to save cost. In AWS, nodes with fewer vCPUs typically also have lower available virtual network bandwidth. Routing ELB traffic through both master and worker instances using equal weight round robin will therefore result in performance problems on the master or potentially starve the cluster from being able to access the master when the load is high. A mechanism for excluding master nodes from ELBs has not been established in this thread. Taints seem like a very complicated way to do this as you would have to look for some very explicit set of taint parameters that would indicate that no external traffic should ever be routed to the node. Tagging could be a more-flexible alternative, however since all worker nodes join the ELB all of the provisioned bandwidth of the worker nodes will already be available to any one ELB. I can't think of a scenario where just always excluding master nodes from ELBs would be problematic. I can envision problems in a cluster with multiple instance groups, each with nodes of different size. Blindly ELB round-robining across all instances both instance groups could result in saturating the network on the nodes in the instance group with smaller instances. IMO:
|
Based on the comment from @greg-jaunt maybe another solution could be to tag the node if it should be excluded from the load balancer for a certain service? That way you could potentially take a node out of the LB for a certain service but keep it for another service if so desired. |
Tags would be the most flexible. The other alternative that comes to mind is to be able to specify an instance group name in the Service config, since I think nodes in an instance group would always all be the same instance type. That's less flexible, but foolproof. An even better solution, IMO, would be to determine what nodes the service is actually running on and only add those nodes to the ELB. |
The solution of allowing services to have |
Having something similar to |
I have checked, and the servicecontroller documents that masters are excluded, but that the only way a master node can be recognized is via the unschedulable field (at the time). There's even a TODO to do something better than assume that unschedulable => master. As such I have submitted a PR to fix this, which is a regression for when we are using the new 1.6 taints and the new 1.6 labels for our master. The servicecontroller now recognizes 1.6 master nodes and will exclude them, returning the documented behaviour :-) |
The servicecontroller documents that the master is excluded from the LoadBalancer / NodePort, but this is broken for clusters where we are using taints for the master (as introduced in 1.6), instead of marking the master as unschedulable. This restores the desired documented behaviour, by excluding nodes that are labeled as masters with the new 1.6 labels, even if they use the new 1.6 taints. Fix kubernetes#33884
Automatic merge from submit-queue Exclude master from LoadBalancer / NodePort The servicecontroller documents that the master is excluded from the LoadBalancer / NodePort, but this is broken for clusters where we are using taints for the master (as introduced in 1.6), instead of marking the master as unschedulable. This restores the desired documented behaviour, by excluding nodes that are labeled as masters with the new 1.6 labels, even if they use the new 1.6 taints. Fix #33884 ```release-note Exclude nodes labeled as master from LoadBalancer / NodePort; restores documented behaviour ```
The servicecontroller documents that the master is excluded from the LoadBalancer / NodePort, but this is broken for clusters where we are using taints for the master (as introduced in 1.6), instead of marking the master as unschedulable. This restores the desired documented behaviour, by excluding nodes that are labeled as masters with the new 1.6 labels, even if they use the new 1.6 taints. Fix kubernetes#33884
The servicecontroller documents that the master is excluded from the LoadBalancer / NodePort, but this is broken for clusters where we are using taints for the master (as introduced in 1.6), instead of marking the master as unschedulable. This restores the desired documented behaviour, by excluding nodes that are labeled as masters with the new 1.6 labels, even if they use the new 1.6 taints. Fix kubernetes#33884
Are we sure this is closed? On 1.6.4 (C) 1.6.2 (S) I've noticed the master node being added to LoadBalancers |
Thanks for the prompt clarification @justinsb |
It seems as though the fix only applies to when nodes are added or removed from the cluster, in which case the nodes in the LB pool are recalculated and only the workers are added. When creating a new LoadBalancer service, masters nodes are still being added. |
@cricci82 which k8s version? #45773 should have harmonized the two cases |
I'm running 1.6.7 |
It wasn't backported to 1.6, it looks like: 976e046 |
That explains it, thank you! |
I'm running a single node cluster on AWS EC2, ie. scheduling work on the master as well. Now that I tried to add an ELB for my service, the EC2 instance is not associated with the ELB. The reason seems to be this. Is my scenario a supported one? @greg-jaunt what would be the second step you were implying? :-) As a workaround, I wonder what would happend if I removed the The error message is:
EDIT: Imma create a new issue for this |
I'm not sure this is correct as we run more things on the master. If I have a service that I want to expose like an API proxy, an aggregated API server that wants its own LB, or a not quite control plane but not quite end user workload. We shouldn't bias towards the master (I.e. a lot of node port services for regular workloads should bypass the master), but this PR as it stands prevents using service load balancers with services that run on masters, which limits options like self hosting and use. Now that node load balancing is more nuanced (with health checking endpoints), do we even need this? Masters should not be marked as healthy when pods aren't run on them, which means no traffic would go to masters? |
I noticed that the master node is now added to NodePort and to AWS ELBs, now that the master is registered as a Node.
I think we probably don't want to do this, as we don't want to be bouncing traffic through the master, for performance reasons. Also the master doesn't always run kube-proxy.
The text was updated successfully, but these errors were encountered: