-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/redis] sentinel cluster does not recover automatically if k8s node dies #6320
Comments
Hi @jsalatiel , |
Just found out that it only happens to one of my clusters. So apparently something else is the culprit. Still debugging.... |
Hi @jsalatiel , |
I noticed the same issue. When master pod is killed (and I mean killed, not gracefully deleted), then the prestop scripts do not run and sentinel leader election repeatedly fails, electing the killed pod's IP over and over again.
(Note that 10.42.2.105 is the address of the killed pod, it doesn't exist). The new incarnation of the killed pod has a different IP address and its sentinel logs do not shed much light:
|
Hi @mouchar , |
My environment:
auth:
enabled: false
sentinel: false
sentinel:
enabled: true
replica:
replicaCount: 3
metrics:
enabled: true Steps to reproduce:
|
Be careful that --force can make the old and new pod run simultaneous and can lead to data corruption. You should only force of you are sure the old pod is really dead. ( Or its node is dead ) https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/ "Force deletions do not wait for confirmation from the kubelet that the Pod has been terminated. Irrespective of whether a force deletion is successful in killing a Pod, it will immediately free up the name from the apiserver. This would let the StatefulSet controller create a replacement Pod with that same identity; this can lead to the duplication of a still-running Pod, and if said Pod can still communicate with the other members of the StatefulSet, will violate the at most one semantics that StatefulSet is designed to guarantee. When you force delete a StatefulSet pod, you are asserting that the Pod in question will never again make contact with other Pods in the StatefulSet and its name can be safely freed up for a replacement to be created." |
Hi @mouchar ,
|
Hi @miguelaeh
|
Hi guys, |
Bump. I am seeing the same issue in my cluster. It's made worse by the fact that I am attempting to run the statefulset on spot instances (trying to get node draining and failover to work quick enough to not cause a major service disruption). |
A colleague is already working on it and he will update this thread once it is solved |
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback. |
Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary. |
Which chart:
bitnami/redis
Describe the bug
I have a redis cluster ( sentinel ) with 3 replicas. They are spread in 3 different nodes. If I kill the container with the role:master by running kubectl delete current-master other node will be promoted to master as expected. Although sometimes it can take almost 1 minute while others take just a few seconds. ( related to the leader lease? )
The problem is when one of the worker nodes where the master is running dies ( poweroff the VM for example ). One new master will never be elected. There is absolutely nothing on the logs for the remaining sentinels.
To Reproduce
You can easily reproduce also on a single node by simply creating a netpolicy that blocks all traffic to/from the current master.
This is my values.yaml
This is the netpolicy you can use, just change the label selector for the current master.
Expected behavior
Sentinel should detect the master is down and promote a new one
Version of Helm and Kubernetes:
helm 3.3.4
k8s 1.19.9
The text was updated successfully, but these errors were encountered: