缺陷: 在错误地对同一节点实施两种join后会使它无法被删除 #621

hmrg-grmh · 2021-04-28T08:12:33Z

命令

sealos join sealos clean

简要描述

一个 node 节点被 join 为 master 节点时，不会受到阻止也不会执行失败（返回码 0）；
被像这样操作后的节点，执行 sealos clean --master <ip> 会提示需要用 sealos clean --node <ip> ，执行 sealos clean --node <ip> 又会被提示要用 sealos clean --master <ip> 。

复现

前提

节点三台（VMware16的虚拟机），系统为 CentOS7.9.2009 ：
node-01 192.168.2.101
node-02 192.168.2.102
node-03 192.168.2.103
用 init 初始化命令：
sealos init --master 192.168.2.101 --node 192.168.2.102 --node 192.168.2.103
(后面还有url和版本略)
sealos 版本： 3.3.9-rc.3 ，包版本： 1.20.6 和 1.19.10 都试过了

执行

sealos clean --node 192.168.2.102 ：正常执行，会先提示是否删除，是则删除。

sealos join --node 192.168.2.102 ：正常执行（返回码 0），用 kubectl get node 会看到执行完后成功加入。

重头戏来了：

在上述基础上

sealos join --master 192.168.2.102 ：正常执行（返回码 0），用 kubectl get node 会看到 node-02 没有成为 master ！

过程中有这样一行 INFO 级别的日志：

此时的 ~/.sealos/config.yaml 中会出现同一个节点既在 nodes 里也在 masters 里的配置。

这时候其实已经出错了。

再对这个节点执行 sealos clean 的话，不管 --node 还是 --master 就都会出错了（返回码 255）。

可以参考的思路

如果一个节点已经有了身份，那就禁止对它的 join 操作，提示 它已经加入了 之类的；
或者询问是否一键把它改成另一种类型，带上 -y 或者 --force 之类的就不询问。
同时另一方面，增加这样一个命令： sealos clean --any <ip>
即，那个中间的选项还是用来防止有人用脚本批量删除的时候误删，只有用 --any 选项的时候才会忽略节点的类型去删。提供一种忽略类型指定但只删一个节点的途径（因为一个节点要么是 master 要么是 node 嘛）。

The text was updated successfully, but these errors were encountered:

oldthreefeng · 2021-04-28T08:44:19Z

如果一个节点已经有了身份，那就禁止对它的 join 操作，提示它已经加入了之类的。

我来操作的话，应该会写这个思路。

如果能提交pr就更好了。

hmrg-grmh · 2021-04-28T08:49:28Z

如果一个节点已经有了身份，那就禁止对它的 join 操作，提示它已经加入了之类的。

我来操作的话，应该会写这个思路。

如果能提交 pr 就更好了。

其实我头一回这样玩就是想看看它能不能自动识别我的需求，然后猜测我是不是想要把这个节点做身份切换。。。

（第二点。。。是考虑，用来照顾旧版本中已经发生这种错误却不能删整个集群，这样的情况。。）

oldthreefeng · 2021-04-28T09:54:09Z

node 节点升级至 master的操作，内在逻辑应该还有很多操作。
最快速的方法就是先干掉node 然后 join master。其实很多前置的检查， sealos还没有完善。
当时写clean的时候， master和node节点写错了，就是考虑了 #566 这个问题，所以加上了限制。

Signed-off-by: hongfeng <[email protected]>

fix #621. join node or master should not exsit in kubernetes.

* rewrite install and delete for app, app.tar don't send if already exist * read byte to remote file by sftp * add etcd health check * restore only one node to local * use subcommad instead Flag * single node save, healthcheck and restore test ok * recovery kube system when err happend by restore * use tar to compress instead of zip. * slove in docker use with save. * add --docker bool flag , if true , this will auto add unix timestamp to your snapshot suffix. * add save snapshot to oss. sealos config add oss-conf * feat(develop): 修复 #430 * fix ETCD CaCert or key file is not exist occurs panic, #427 * add kubernetes cronjob example yaml file * fix #441 * fix issue #443 * implementaion for sealos exec #429 * fix bug --label & --node will exec cmd twice and add example exec cmd * support exec scp local dir to remote * refactor exec command and use get ip by node name & by labelselector method to avoid for loop * fix sealos etcd health check for mutil master * add bash/zsh completion for sealos. from kubectl * fix kubeadm not found on old sealos package && fix port in exec cmd #469 && fix --service-cidr on old version && delete route cmd * fix 1.19.1 kube-controller-manager and kube-scheduler use the LocalAPIEndpoint instead of the ControlPlaneEndpoint. * add sealos route cmd docs * rm original sealos in old package to aviod some problem. add multi network install docs stage add upx in dockerfile, add upx in drone. change image to golang:15.2-alpine * dockerfile add upx stage build * fix versiontointall when version like v1.16.14 >= 1191 * add test record for upgrade cmd * when init , do not send twice, if valid copy md5 success ,do not logger * sepreate install master0 and other master when send ca and key and kubeconfig * fix --config when use customer config. * validate copy kubetarball * fix #499 * fix #509. * fix #534 only for 1.19.1 and 1.19.2 * /root/.kube/config设置为600，否则默认其它组有r权限，导致helm工具执行有警告 * when kubernetes gt 1.20, use Containerd instead of docker, #540 suport 1.20 containerd * fix #566, sealos clean --node 不小心写了 masterip add cleanCmd example * fix #571. handle unexpected error * fix #577, join node use config file, fix ipformat comment to oss when push to develop, only to tag to release Fix 1.14.x has no kubeadm.k8s.io/v1beta2 by use cli kubeadm join --xxxx. * fix #586, drain node is too danger for prod use; do not drain nodes drain worker node is too danger for prod use; do not drain nodes if worker nodes~ * fix build status (#610) * fix arm64 tags * fix #613, delete -i for cp command * fix #621. join node or master should not exsit in kubernetes. * feat(develop): fix ipip param not set false (#653) * # 决绝路径取消拼接，防止出错。 (#654) * feat(develop): fix cni config too long (#655) * fix version 3.19.1 yaml file lint error. (#656) * fix calico (#657) * fix calico version nil yaml file retrun null. (#658) * Update upgrade.md (#665) * use new const for kubeletconfig (#589) Signed-off-by: oldthreefeng <[email protected]> * [WIP]Sealos kubeadm 1.23 v1beta3 (#673) * fix #671 * feat(develop): fix ci dir for sealos (#735) * fix bootstrapToken (#737) * feat(develop): rc6 release (#738) * ci(develop) fix golint for code and lic (#736) Co-authored-by: steven <[email protected]> Co-authored-by: oldthreefeng <[email protected]> Co-authored-by: 中弈 <[email protected]> Co-authored-by: Ryan <[email protected]> Co-authored-by: Louis <[email protected]> Co-authored-by: ysicing <[email protected]> Co-authored-by: huizhi.szh <[email protected]> Co-authored-by: aiyijing <[email protected]> Co-authored-by: scott lewis <[email protected]> Co-authored-by: wenshihong <[email protected]> Co-authored-by: wisheen <[email protected]> Co-authored-by: Cluas <[email protected]> Co-authored-by: currycan <[email protected]> Co-authored-by: zhangzhitao <[email protected]> Co-authored-by: rick <[email protected]> Co-authored-by: panda-lab <[email protected]> Co-authored-by: 付亮 <[email protected]> Co-authored-by: SorryMaker <[email protected]>

* ci(develop) fix golint for code and lic (#736) * merge to master (#739) * rewrite install and delete for app, app.tar don't send if already exist * read byte to remote file by sftp * add etcd health check * restore only one node to local * use subcommad instead Flag * single node save, healthcheck and restore test ok * recovery kube system when err happend by restore * use tar to compress instead of zip. * slove in docker use with save. * add --docker bool flag , if true , this will auto add unix timestamp to your snapshot suffix. * add save snapshot to oss. sealos config add oss-conf * feat(develop): 修复 #430 * fix ETCD CaCert or key file is not exist occurs panic, #427 * add kubernetes cronjob example yaml file * fix #441 * fix issue #443 * implementaion for sealos exec #429 * fix bug --label & --node will exec cmd twice and add example exec cmd * support exec scp local dir to remote * refactor exec command and use get ip by node name & by labelselector method to avoid for loop * fix sealos etcd health check for mutil master * add bash/zsh completion for sealos. from kubectl * fix kubeadm not found on old sealos package && fix port in exec cmd #469 && fix --service-cidr on old version && delete route cmd * fix 1.19.1 kube-controller-manager and kube-scheduler use the LocalAPIEndpoint instead of the ControlPlaneEndpoint. * add sealos route cmd docs * rm original sealos in old package to aviod some problem. add multi network install docs stage add upx in dockerfile, add upx in drone. change image to golang:15.2-alpine * dockerfile add upx stage build * fix versiontointall when version like v1.16.14 >= 1191 * add test record for upgrade cmd * when init , do not send twice, if valid copy md5 success ,do not logger * sepreate install master0 and other master when send ca and key and kubeconfig * fix --config when use customer config. * validate copy kubetarball * fix #499 * fix #509. * fix #534 only for 1.19.1 and 1.19.2 * /root/.kube/config设置为600，否则默认其它组有r权限，导致helm工具执行有警告 * when kubernetes gt 1.20, use Containerd instead of docker, #540 suport 1.20 containerd * fix #566, sealos clean --node 不小心写了 masterip add cleanCmd example * fix #571. handle unexpected error * fix #577, join node use config file, fix ipformat comment to oss when push to develop, only to tag to release Fix 1.14.x has no kubeadm.k8s.io/v1beta2 by use cli kubeadm join --xxxx. * fix #586, drain node is too danger for prod use; do not drain nodes drain worker node is too danger for prod use; do not drain nodes if worker nodes~ * fix build status (#610) * fix arm64 tags * fix #613, delete -i for cp command * fix #621. join node or master should not exsit in kubernetes. * feat(develop): fix ipip param not set false (#653) * # 决绝路径取消拼接，防止出错。 (#654) * feat(develop): fix cni config too long (#655) * fix version 3.19.1 yaml file lint error. (#656) * fix calico (#657) * fix calico version nil yaml file retrun null. (#658) * Update upgrade.md (#665) * use new const for kubeletconfig (#589) Signed-off-by: oldthreefeng <[email protected]> * [WIP]Sealos kubeadm 1.23 v1beta3 (#673) * fix #671 * feat(develop): fix ci dir for sealos (#735) * fix bootstrapToken (#737) * feat(develop): rc6 release (#738) * ci(develop) fix golint for code and lic (#736) Co-authored-by: steven <[email protected]> Co-authored-by: oldthreefeng <[email protected]> Co-authored-by: 中弈 <[email protected]> Co-authored-by: Ryan <[email protected]> Co-authored-by: Louis <[email protected]> Co-authored-by: ysicing <[email protected]> Co-authored-by: huizhi.szh <[email protected]> Co-authored-by: aiyijing <[email protected]> Co-authored-by: scott lewis <[email protected]> Co-authored-by: wenshihong <[email protected]> Co-authored-by: wisheen <[email protected]> Co-authored-by: Cluas <[email protected]> Co-authored-by: currycan <[email protected]> Co-authored-by: zhangzhitao <[email protected]> Co-authored-by: rick <[email protected]> Co-authored-by: panda-lab <[email protected]> Co-authored-by: 付亮 <[email protected]> Co-authored-by: SorryMaker <[email protected]> * Revert "merge to master (#739)" (#741) This reverts commit c8349b0. * Update README.md * ci(master): add dockerfile * hotfix(master): clean panic fix by lock (#750) * refactor(ci): add auto invite (#762) * refactor(ci): add auto invite (#763) * refactor(ci): add auto invite * docs: readme align * refactor(dev): fix docs site (#773) * refactor(master): cloud,app feature close (#774) * refactor(master): cloud,app feature close * 缺陷: 最新版本sealos init 安装k8s 失败 (#778) Fixes #691 * refactor(master): release rc.8 (#782) * refactor(master): changelog (#784) * update changelog to master (#785) * refactor(master): changelog Co-authored-by: steven <[email protected]> Co-authored-by: oldthreefeng <[email protected]> Co-authored-by: 中弈 <[email protected]> Co-authored-by: Ryan <[email protected]> Co-authored-by: Louis <[email protected]> Co-authored-by: ysicing <[email protected]> Co-authored-by: huizhi.szh <[email protected]> Co-authored-by: aiyijing <[email protected]> Co-authored-by: scott lewis <[email protected]> Co-authored-by: wenshihong <[email protected]> Co-authored-by: wisheen <[email protected]> Co-authored-by: Cluas <[email protected]> Co-authored-by: currycan <[email protected]> Co-authored-by: zhangzhitao <[email protected]> Co-authored-by: rick <[email protected]> Co-authored-by: panda-lab <[email protected]> Co-authored-by: 付亮 <[email protected]> Co-authored-by: SorryMaker <[email protected]> Co-authored-by: jiangyanfei <[email protected]> Co-authored-by: ldseraph <[email protected]>

oldthreefeng self-assigned this Apr 28, 2021

oldthreefeng added a commit to oldthreefeng/sealos that referenced this issue Apr 29, 2021

fix labring#621. join node or master should not exsit in kubernetes.

195a831

Signed-off-by: hongfeng <[email protected]>

cuisongliu closed this as completed in 86e50cb Apr 29, 2021

cuisongliu added a commit that referenced this issue Apr 29, 2021

Merge pull request #622 from oldthreefeng/develop

f1e1a3c

fix #621. join node or master should not exsit in kubernetes.

cuisongliu mentioned this issue Dec 9, 2021

merge to master (#739) #740

Closed

cuisongliu added the type: question Further information is requested label Dec 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

缺陷: 在错误地对同一节点实施两种join后会使它无法被删除 #621

缺陷: 在错误地对同一节点实施两种join后会使它无法被删除 #621

hmrg-grmh commented Apr 28, 2021 •

edited

Loading

oldthreefeng commented Apr 28, 2021

hmrg-grmh commented Apr 28, 2021

oldthreefeng commented Apr 28, 2021

缺陷: 在错误地对同一节点实施两种join后会使它无法被删除 #621

缺陷: 在错误地对同一节点实施两种join后会使它无法被删除 #621

Comments

hmrg-grmh commented Apr 28, 2021 • edited Loading

命令

简要描述

复现

前提

执行

可以参考的思路

oldthreefeng commented Apr 28, 2021

hmrg-grmh commented Apr 28, 2021

oldthreefeng commented Apr 28, 2021

hmrg-grmh commented Apr 28, 2021 •

edited

Loading