-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to add second control plane node w 1.17.4 #2072
Comments
hi, just to point out that most of the kubeadm e2e tests are multi-control-plane setups with stacked etcd: we have users using kubeadm HA setups in production, so it must be an issue in your setup.
TLS boostrap is the same for workers and CP nodes joining the cluster.
it still works, because in stacked etcd the CP node api-server only talks to the etcd member on the local node.
can you show some errors logs of what you are seeing from the second control-plane node? /triage needs-information |
Yes, I should have prefaced with that. I'm also clear that there is some edge case that my environment is evoking, just not clear what it is and digging in the code has only produced more questions. I've even gone so far as to change the configuration of
That's all good information, thanks. I have been reading code, but not e2e. I am going to try a few additional things this morning, then start reading the e2e setups to see if I can at least replicate with the same parameters.
I get a lot of these in the
EDIT: This is with me changing the Yes, I have been running kubeadm with As an aside, I imagine it must be possible to attach Delve to running processes and have far better clarity on what's happening, could I do this with additional I have a lot more information to provide over the next few hours, just wanted to fill in what I could now. /triage needs-information |
i think you need to understand why this is happening. you can also try adding a member manually and joining it to the cluster using
that might not be that easy, because we are using the tool you might be able to use
again, the TLS bootstrap should have already finished when you are seeing the etcd failure. it is pre-step to the etcd member join failure and i don't think it's related.
i do not see a kubeadm related bug in this ticket. if you find one please provide the exact reproduction steps. we usually close support tickets after a few days as the kubeadm maintainers don't have the bandwidth...and usually delegate to the support channels and forums: https://github.com/kubernetes/kubernetes/blob/master/SUPPORT.md |
Ah, I didn't parse that earlier, thanks.
There's definitely a usability issue. Maybe one could call the improvement of notification in a failure to install a "feature request", but it's still failing to install with unknown cause. Some call that a bug, it's just semantics. I'm not asking for support here and certainly didn't want to waste your time! But I am grateful for your explanation of when I just got back from a workout, so I haven't started digging yet. I will see if I can pull up a PR that solves the issue and attach it when I have that together or otherwise resolve the issue when it's clear what's broken. Thanks again for your input! |
kubeadm exposes the means for customization of a Kubernetes deployment, but this also allows the users to deploy a broken setup in many different ways. piping the error messages from component to the kubeadm output is a non-goal, so one has to look at what the component are reporting: local etcd server, kubelet, api-server, etc. |
The problem that I believe I have found is the failure occurs when the bind/advertise interfaces are non-default. In that case, the init node must have When the joining node comes online, it fetches the I believe if the default interfaces were used, the init node would not require the use of Note that even if the init node did use the default interface, no join nodes could put the apiserver address on a secondary interface (no |
the lack of configuration per joining node is certainly problematic and we have plans to extend that in the future with instance specific configuration. not clear when this will happen and how as this is difficult. kubeadm has its ClusterConfiguration as it's source of truth for all control plane members and it partially makes the naive assumption that control-plane nodes are replicas.
this will not work as multi-node solution. bind-address is an instance specific flag for the api-server. you have the following options:
note that the kubelet always picks an IP from a public interface. this is hardcoded and there is no way to customize it. closing as we are already tracking the instance specific work. let me know if you have further questions. |
@neolit123: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Again, if nothing else "there's definitely a usability issue". Many would say that's bad security practice if the primary interface is on the public internet. It's also not documented anywhere.
Where is this being tracked? |
i do not disagree that it is a usability issue, for the kubeadm HA support.
in terms of documentation, for the kubelet, the following is not documented and should be logged as a ticket in kubernetes/kubernetes and tagged as related discussion for the bind address of the API server: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
kubeadm does not use --bind-address for the api-server, it uses --advertise-address, and kubeadm documents it as:
you can watch this ticket: not directly for ClusterConfiguration, but this is what we have about instance specific configuration. |
Thanks @neolit123, that gets me through what I need to know about addressing. I understand now why it's not worth fixing docs when ComponentConfig is nearly complete and going to deprecate everything. I apologize if that was frustrating, it's been a couple of weeks of dead ends here so I get it. |
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version: kubeadm version: &version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T21:01:11Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Environment:
kubectl version
):Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T21:03:42Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T20:55:23Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
docker info
:kubeadm init configuration:
kubeadm join configuration:
What happened?
Install first node with init configuration, after CNI install and taint removal, node running fine. Move on to second node, use join configuration. Node is created with
kube-apiserver
set to useetc
onlocalhost
:This is set here.
In the TLS bootstrap phase, the local
etcd
instance is contacted, but it isn't created until a later phase.How did this code ever work?
What you expected to happen?
Since there's no way to start a local stacked
etcd
without the rest of the control plane being stable, it seems likekube-apiserver
needs to be bootstrapped to the firstetcd
, then the local stackedetcd
needs to bootstrapped. Only then can the localkube-apiserver
be set to useetcd
on localhost as the above code is set.How to reproduce it (as minimally and precisely as possible)?
Anything else we need to know?
The text was updated successfully, but these errors were encountered: