Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

缺陷: 在错误地对同一节点实施两种join后会使它无法被删除 #621

Closed
hmrg-grmh opened this issue Apr 28, 2021 · 3 comments
Assignees
Labels
type: question Further information is requested

Comments

@hmrg-grmh
Copy link

hmrg-grmh commented Apr 28, 2021

命令

sealos join sealos clean

简要描述

  • 一个 node 节点被 join 为 master 节点时,不会受到阻止也不会执行失败(返回码 0);
  • 被像这样操作后的节点,执行 sealos clean --master <ip> 会提示需要用 sealos clean --node <ip> ,执行 sealos clean --node <ip> 又会被提示要用 sealos clean --master <ip>

复现

前提

  1. 节点三台(VMware16的虚拟机),系统为 CentOS7.9.2009
    node-01 192.168.2.101
    node-02 192.168.2.102
    node-03 192.168.2.103

  2. init 初始化命令:
    sealos init --master 192.168.2.101 --node 192.168.2.102 --node 192.168.2.103
    (后面还有url和版本略)

  3. sealos 版本: 3.3.9-rc.3 , 包版本: 1.20.61.19.10 都试过了

执行

image

sealos clean --node 192.168.2.102 :正常执行,会先提示是否删除,是则删除。

sealos join --node 192.168.2.102 :正常执行(返回码 0),用 kubectl get node 会看到执行完后成功加入。

重头戏来了:

在上述基础上

sealos join --master 192.168.2.102 :正常执行(返回码 0),用 kubectl get node 会看到 node-02 没有成为 master

过程中有这样一行 INFO 级别的日志:

image

此时的 ~/.sealos/config.yaml 中会出现同一个节点既在 nodes 里也在 masters 里的配置。

这时候其实已经出错了。

再对这个节点执行 sealos clean 的话,不管 --node 还是 --master 就都会出错了(返回码 255)。

可以参考的思路

  1. 如果一个节点已经有了身份,那就禁止对它的 join 操作,提示 它已经加入了 之类的;
    或者询问是否一键把它改成另一种类型,带上 -y 或者 --force 之类的就不询问。
  2. 同时另一方面,增加这样一个命令: sealos clean --any <ip>
    即,那个中间的选项还是用来防止有人用脚本批量删除的时候误删,只有用 --any 选项的时候才会忽略节点的类型去删。提供一种忽略类型指定但只删一个节点的途径(因为一个节点要么是 master 要么是 node 嘛)。
@oldthreefeng oldthreefeng self-assigned this Apr 28, 2021
@oldthreefeng
Copy link
Collaborator

如果一个节点已经有了身份,那就禁止对它的 join 操作,提示 它已经加入了 之类的。

我来操作的话,应该会写这个思路。

如果能提交pr就更好了。

@hmrg-grmh
Copy link
Author

如果一个节点已经有了身份,那就禁止对它的 join 操作,提示 它已经加入了 之类的。

我来操作的话,应该会写这个思路。

如果能提交 pr 就更好了。

其实我头一回这样玩就是想看看它能不能自动识别我的需求,然后猜测我是不是想要把这个节点做身份切换。。。

(第二点。。。是考虑,用来照顾旧版本中已经发生这种错误却不能删整个集群,这样的情况。。)

@oldthreefeng
Copy link
Collaborator

node 节点升级至 master的操作, 内在逻辑应该还有很多操作。
最快速的方法就是先干掉node 然后 join master。 其实很多前置的检查, sealos还没有完善。
当时写clean的时候, master和node节点写错了, 就是考虑了 #566 这个问题,所以加上了限制。

oldthreefeng added a commit to oldthreefeng/sealos that referenced this issue Apr 29, 2021
cuisongliu added a commit that referenced this issue Apr 29, 2021
fix #621. join node or master should not exsit in kubernetes.
cuisongliu added a commit that referenced this issue Dec 9, 2021
* rewrite install and delete for app, app.tar don't send if already exist
* read byte to remote file by sftp
* add etcd health check
* restore only one node to local
* use subcommad instead Flag
* single node save, healthcheck and restore test ok
* recovery kube system when err happend by restore
* use tar to compress instead of zip.
* slove in docker use with save.
* add --docker bool flag , if true , this will auto add unix timestamp to your snapshot suffix.
* add save snapshot to oss. sealos config add oss-conf
* feat(develop): 修复 #430
* fix ETCD CaCert or key file is not exist occurs panic,  #427
* add kubernetes cronjob example yaml file
* fix #441
* fix issue #443
* implementaion for sealos exec #429
* fix bug --label & --node will exec cmd twice and add example exec cmd
* support exec scp local dir to remote
* refactor exec command and use get ip  by node name & by labelselector method to avoid for loop
* fix sealos etcd health check for mutil master
* add bash/zsh completion for sealos. from kubectl
* fix kubeadm not found on old sealos package && fix port in exec cmd #469  && fix --service-cidr on old version && delete route cmd
* fix 1.19.1 kube-controller-manager and kube-scheduler use the LocalAPIEndpoint instead of the ControlPlaneEndpoint.
* add sealos route cmd docs
* rm original sealos in old package to aviod some problem.
add multi network install docs stage
add upx in dockerfile, add upx in drone. change image to golang:15.2-alpine
* dockerfile add upx stage build
* fix versiontointall when version like v1.16.14 >= 1191
* add test record for upgrade cmd
* when init , do not send twice, if valid copy md5 success ,do not logger
* sepreate install master0 and other master when send ca and key and kubeconfig
* fix --config when use customer config.
* validate copy kubetarball
* fix #499
* fix #509.
* fix #534 only for 1.19.1 and 1.19.2
* /root/.kube/config设置为600,否则默认其它组有r权限,导致helm工具执行有警告
* when kubernetes gt 1.20, use Containerd instead of docker, #540 suport 1.20 containerd
* fix #566, sealos clean --node 不小心写了 masterip  add cleanCmd example
* fix #571. handle unexpected error
* fix #577, join node use config file, fix ipformat
comment to oss when push to develop, only to  tag to release
Fix 1.14.x has no kubeadm.k8s.io/v1beta2 by use cli kubeadm join --xxxx.
* fix #586,  drain node is too danger for prod use; do not drain nodes
drain worker node is too danger for prod use; do not drain nodes if worker nodes~
* fix build status (#610)
* fix arm64 tags
* fix #613, delete -i for cp command
* fix #621. join node or master should not exsit in kubernetes.
* feat(develop): fix  ipip param not set false (#653)
* # 决绝路径取消拼接,防止出错。 (#654)
* feat(develop): fix cni config too long (#655)
* fix version 3.19.1 yaml file lint error. (#656)
* fix calico (#657)
* fix calico version nil yaml file retrun null. (#658)
* Update upgrade.md (#665)
* use new const for kubeletconfig (#589)
Signed-off-by: oldthreefeng <[email protected]>
* [WIP]Sealos kubeadm 1.23 v1beta3 (#673)
* fix #671
* feat(develop): fix ci dir for sealos (#735)
* fix  bootstrapToken (#737)
* feat(develop): rc6 release (#738)
* ci(develop) fix golint for code and lic (#736)

Co-authored-by: steven <[email protected]>
Co-authored-by: oldthreefeng <[email protected]>
Co-authored-by: 中弈 <[email protected]>
Co-authored-by: Ryan <[email protected]>
Co-authored-by: Louis <[email protected]>
Co-authored-by: ysicing <[email protected]>
Co-authored-by: huizhi.szh <[email protected]>
Co-authored-by: aiyijing <[email protected]>
Co-authored-by: scott lewis <[email protected]>
Co-authored-by: wenshihong <[email protected]>
Co-authored-by: wisheen <[email protected]>
Co-authored-by: Cluas <[email protected]>
Co-authored-by: currycan <[email protected]>
Co-authored-by: zhangzhitao <[email protected]>
Co-authored-by: rick <[email protected]>
Co-authored-by: panda-lab <[email protected]>
Co-authored-by: 付亮 <[email protected]>
Co-authored-by: SorryMaker <[email protected]>
@cuisongliu cuisongliu added the type: question Further information is requested label Dec 28, 2021
cuisongliu added a commit that referenced this issue Dec 29, 2021
* ci(develop) fix golint for code and lic (#736)

* merge to master (#739)

* rewrite install and delete for app, app.tar don't send if already exist
* read byte to remote file by sftp
* add etcd health check
* restore only one node to local
* use subcommad instead Flag
* single node save, healthcheck and restore test ok
* recovery kube system when err happend by restore
* use tar to compress instead of zip.
* slove in docker use with save.
* add --docker bool flag , if true , this will auto add unix timestamp to your snapshot suffix.
* add save snapshot to oss. sealos config add oss-conf
* feat(develop): 修复 #430
* fix ETCD CaCert or key file is not exist occurs panic,  #427
* add kubernetes cronjob example yaml file
* fix #441
* fix issue #443
* implementaion for sealos exec #429
* fix bug --label & --node will exec cmd twice and add example exec cmd
* support exec scp local dir to remote
* refactor exec command and use get ip  by node name & by labelselector method to avoid for loop
* fix sealos etcd health check for mutil master
* add bash/zsh completion for sealos. from kubectl
* fix kubeadm not found on old sealos package && fix port in exec cmd #469  && fix --service-cidr on old version && delete route cmd
* fix 1.19.1 kube-controller-manager and kube-scheduler use the LocalAPIEndpoint instead of the ControlPlaneEndpoint.
* add sealos route cmd docs
* rm original sealos in old package to aviod some problem.
add multi network install docs stage
add upx in dockerfile, add upx in drone. change image to golang:15.2-alpine
* dockerfile add upx stage build
* fix versiontointall when version like v1.16.14 >= 1191
* add test record for upgrade cmd
* when init , do not send twice, if valid copy md5 success ,do not logger
* sepreate install master0 and other master when send ca and key and kubeconfig
* fix --config when use customer config.
* validate copy kubetarball
* fix #499
* fix #509.
* fix #534 only for 1.19.1 and 1.19.2
* /root/.kube/config设置为600,否则默认其它组有r权限,导致helm工具执行有警告
* when kubernetes gt 1.20, use Containerd instead of docker, #540 suport 1.20 containerd
* fix #566, sealos clean --node 不小心写了 masterip  add cleanCmd example
* fix #571. handle unexpected error
* fix #577, join node use config file, fix ipformat
comment to oss when push to develop, only to  tag to release
Fix 1.14.x has no kubeadm.k8s.io/v1beta2 by use cli kubeadm join --xxxx.
* fix #586,  drain node is too danger for prod use; do not drain nodes
drain worker node is too danger for prod use; do not drain nodes if worker nodes~
* fix build status (#610)
* fix arm64 tags
* fix #613, delete -i for cp command
* fix #621. join node or master should not exsit in kubernetes.
* feat(develop): fix  ipip param not set false (#653)
* # 决绝路径取消拼接,防止出错。 (#654)
* feat(develop): fix cni config too long (#655)
* fix version 3.19.1 yaml file lint error. (#656)
* fix calico (#657)
* fix calico version nil yaml file retrun null. (#658)
* Update upgrade.md (#665)
* use new const for kubeletconfig (#589)
Signed-off-by: oldthreefeng <[email protected]>
* [WIP]Sealos kubeadm 1.23 v1beta3 (#673)
* fix #671
* feat(develop): fix ci dir for sealos (#735)
* fix  bootstrapToken (#737)
* feat(develop): rc6 release (#738)
* ci(develop) fix golint for code and lic (#736)

Co-authored-by: steven <[email protected]>
Co-authored-by: oldthreefeng <[email protected]>
Co-authored-by: 中弈 <[email protected]>
Co-authored-by: Ryan <[email protected]>
Co-authored-by: Louis <[email protected]>
Co-authored-by: ysicing <[email protected]>
Co-authored-by: huizhi.szh <[email protected]>
Co-authored-by: aiyijing <[email protected]>
Co-authored-by: scott lewis <[email protected]>
Co-authored-by: wenshihong <[email protected]>
Co-authored-by: wisheen <[email protected]>
Co-authored-by: Cluas <[email protected]>
Co-authored-by: currycan <[email protected]>
Co-authored-by: zhangzhitao <[email protected]>
Co-authored-by: rick <[email protected]>
Co-authored-by: panda-lab <[email protected]>
Co-authored-by: 付亮 <[email protected]>
Co-authored-by: SorryMaker <[email protected]>

* Revert "merge to master (#739)" (#741)

This reverts commit c8349b0.

* Update README.md

* ci(master): add dockerfile

* hotfix(master): clean panic fix by lock (#750)

* refactor(ci): add auto invite (#762)

* refactor(ci): add auto invite (#763)

* refactor(ci): add auto invite

* docs: readme align

* refactor(dev): fix docs site (#773)

* refactor(master): cloud,app feature close (#774)

* refactor(master): cloud,app feature close

* 缺陷: 最新版本sealos init 安装k8s 失败 (#778)

Fixes #691

* refactor(master): release rc.8 (#782)

* refactor(master): changelog (#784)

* update changelog to master (#785)

* refactor(master): changelog

Co-authored-by: steven <[email protected]>
Co-authored-by: oldthreefeng <[email protected]>
Co-authored-by: 中弈 <[email protected]>
Co-authored-by: Ryan <[email protected]>
Co-authored-by: Louis <[email protected]>
Co-authored-by: ysicing <[email protected]>
Co-authored-by: huizhi.szh <[email protected]>
Co-authored-by: aiyijing <[email protected]>
Co-authored-by: scott lewis <[email protected]>
Co-authored-by: wenshihong <[email protected]>
Co-authored-by: wisheen <[email protected]>
Co-authored-by: Cluas <[email protected]>
Co-authored-by: currycan <[email protected]>
Co-authored-by: zhangzhitao <[email protected]>
Co-authored-by: rick <[email protected]>
Co-authored-by: panda-lab <[email protected]>
Co-authored-by: 付亮 <[email protected]>
Co-authored-by: SorryMaker <[email protected]>
Co-authored-by: jiangyanfei <[email protected]>
Co-authored-by: ldseraph <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants