Redis Master Node restart issue on Sentinel Mode #6971

tolgatuna · 2021-07-15T22:34:21Z

Which chart:
redis

Describe the bug
Bug is related with sentinel mode. I enabled with 'sentinel.enabled: true'. If you kill the master node in that mode, it starts to try rerun itself again and again in a loop.

To Reproduce
Steps to reproduce the behavior:

enable sentinel mode.
Do not configure anything else.
Run it
Kill the master redis pod (0) manually.

Expected behavior
You will see master pod will try to up itself. But you will take error "0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims."

Version of Helm and Kubernetes:

Output of helm version:

version.BuildInfo{Version:"v3.6.2", GitCommit:"ee407bdf364942bcb8e8c665f82e15aa28009b71", GitTreeState:"dirty", GoVersion:"go1.16.5"}

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:15:20Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

The text was updated successfully, but these errors were encountered:

miguelaeh · 2021-07-16T06:38:41Z

Hi @tolgatuna ,

1 pod has unbound immediate PersistentVolumeClaims.

This error is usually not related to the Helm Chart but to your cluster dynamic volume provision. I would recommend you to re check the storage class you set (or the default one in case you did not change it) and check why that PVC is not matched.

tolgatuna · 2021-07-16T08:39:17Z

Hi @miguelaeh

ame:            pvc-be51fd93-4b37-4f03-857d-753888789730
Labels:          <none>
Annotations:     docker.io/hostpath: /var/lib/k8s-pvs/redis-data-caphelmchart-redis-node-0/pvc-be51fd93-4b37-4f03-857d-753888789730
                 pv.kubernetes.io/provisioned-by: docker.io/hostpath
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    hostpath
Status:          Bound
Claim:           siscap-dev/redis-data-caphelmchart-redis-node-0
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        8Gi
Node Affinity:   <none>
Message:         
Source:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/k8s-pvs/redis-data-caphelmchart-redis-node-0/pvc-be51fd93-4b37-4f03-857d-753888789730
    HostPathType:  
Events:            <none>

Here is details for my pv. By the way last time it was giving error as "0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims." But i realised that it was sometimes giving error :

Readiness probe errored: rpc error: code = Unknown desc = container not running (0250f8a13618e031841839217c348b8fe4c17b44c6f77a9cf712621f0cedfb9f)
Liveness probe errored: rpc error: code = Unknown desc = container not running (0250f8a13618e031841839217c348b8fe4c17b44c6f77a9cf712621f0cedfb9f)

And inside the pot only log is :

Could not connect to Redis at 10.1.5.230:26379: No route to host

rafariossaa · 2021-07-19T10:40:54Z

Hi,
I deployed it in minikube and these are the details:

$ kubectl describe pvc redis-data-myredis-node-0 
Name:          redis-data-myredis-node-0
Namespace:     default
StorageClass:  standard
Status:        Bound
Volume:        pvc-69f01c32-75bd-44e2-84ca-caca508328d3
Labels:        app.kubernetes.io/component=node
             app.kubernetes.io/instance=myredis
             app.kubernetes.io/name=redis
Annotations:   pv.kubernetes.io/bind-completed: yes
             pv.kubernetes.io/bound-by-controller: yes
             volume.beta.kubernetes.io/storage-provisioner: k8s.io/minikube-hostpath
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      8Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       myredis-node-0

What cached my attention from yours is that you don't have any labels. Could you delete the PVCs and try again ?

anessi · 2021-08-05T21:36:30Z

We have also notice this problem, even with persistence disabled.

master:
  persistence:
    enabled: false
  
replica:
  persistence:
    enabled: false

Seems this appeared on a recent version only as older versions (e.g. chart version 14.1.1 which is using redis:6.2.3-debian-10-r0 and redis-sentinel:6.2.2-debian-10-r12) work fine.

It's not able to elect a new master and loops forever:

1:X 04 Aug 2021 13:43:45.360 # +new-epoch 36
1:X 04 Aug 2021 13:43:45.360 # +try-failover master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:43:45.370 # +vote-for-leader a39bd657b098ce60ae7f34875ebc53c582b05d05 36
1:X 04 Aug 2021 13:43:45.400 # fb5d1f32fa4fa1d6a01a5622bdc570b60ebbeac0 voted for a39bd657b098ce60ae7f34875ebc53c582b05d05 36
1:X 04 Aug 2021 13:43:45.432 # +elected-leader master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:43:45.432 # +failover-state-select-slave master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:43:45.503 # -failover-abort-no-good-slave master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:43:45.588 # Next failover delay: I will not start a failover before Wed Aug  4 13:44:21 2021
1:X 04 Aug 2021 13:44:21.576 # +new-epoch 37
1:X 04 Aug 2021 13:44:21.576 # +try-failover master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:44:21.591 # +vote-for-leader a39bd657b098ce60ae7f34875ebc53c582b05d05 37
1:X 04 Aug 2021 13:44:21.647 # fb5d1f32fa4fa1d6a01a5622bdc570b60ebbeac0 voted for a39bd657b098ce60ae7f34875ebc53c582b05d05 37
1:X 04 Aug 2021 13:44:21.663 # +elected-leader master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:44:21.663 # +failover-state-select-slave master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:44:21.734 # -failover-abort-no-good-slave master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:44:21.825 # Next failover delay: I will not start a failover before Wed Aug  4 13:44:58 2021
1:X 04 Aug 2021 13:44:58.185 # +new-epoch 38
1:X 04 Aug 2021 13:44:58.204 # +vote-for-leader fb5d1f32fa4fa1d6a01a5622bdc570b60ebbeac0 38
1:X 04 Aug 2021 13:44:58.257 # Next failover delay: I will not start a failover before Wed Aug  4 13:45:34 2021
1:X 04 Aug 2021 13:45:34.329 # +new-epoch 39
1:X 04 Aug 2021 13:45:34.350 # +vote-for-leader fb5d1f32fa4fa1d6a01a5622bdc570b60ebbeac0 39
1:X 04 Aug 2021 13:45:34.350 # Next failover delay: I will not start a failover before Wed Aug  4 13:46:11 2021
1:X 04 Aug 2021 13:46:11.247 # +new-epoch 40
1:X 04 Aug 2021 13:46:11.247 # +try-failover master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:46:11.256 # +vote-for-leader a39bd657b098ce60ae7f34875ebc53c582b05d05 40
1:X 04 Aug 2021 13:46:11.257 # fb5d1f32fa4fa1d6a01a5622bdc570b60ebbeac0 voted for fb5d1f32fa4fa1d6a01a5622bdc570b60ebbeac0 40
1:X 04 Aug 2021 13:46:21.501 # -failover-abort-not-elected master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:46:21.560 # Next failover delay: I will not start a failover before Wed Aug  4 13:46:47 2021

rafariossaa · 2021-08-06T09:12:08Z

Could you try with the latest version: 14.8.8 that uses redis 6.2.5 ?

anessi · 2021-08-10T06:25:34Z

With the 14.8.8 version of the Redis Helm chart the behavior is the same as with 14.7.2, so that problem is still there.

pablogalegoc · 2021-08-11T08:28:42Z

This should be the same issue reported in #7181 and there's a proposed fix in #7182

ShineSmile · 2021-08-17T11:08:41Z

Comment from @bluecrabs007 at Jun 10 in #6165 helps me to solve my problem.

github-actions · 2021-09-02T01:21:23Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

anessi · 2021-09-02T06:36:42Z

For us the changes done in #7278 (version 15.0.1 + an additional fix in 15.0.4) solves the issues. It works with or without the changes mentioned above by @ShineSmile (#6165 (comment)).

ShineSmile · 2021-09-06T06:19:16Z

I found that if we enable istio sidecar with label istio-injection=enabled, the master node won't restart successfully after delete manually .

anessi · 2021-09-06T06:27:58Z

In our setup the Redis instance is also running on an ISTIO enabled namespace and works just fine with version 15.0.4. After deleting the master node manually it recovers without problems. Just tested again. We have storage disabled, but that should not make a difference.

pablogalegoc · 2021-09-06T07:27:02Z

@ShineSmile for the sentinel with Istio issue I'd recommend opening another issue. Just so you know, we don't test our charts with Istio so we can't officially support it but we'll be glad to help if needed.

github-actions · 2021-09-22T01:22:28Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions · 2021-09-28T01:22:15Z

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

github-actions bot added the stale 15 days without activity label Sep 2, 2021

github-actions bot removed the stale 15 days without activity label Sep 3, 2021

github-actions bot added the stale 15 days without activity label Sep 22, 2021

github-actions bot closed this as completed Sep 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis Master Node restart issue on Sentinel Mode #6971

Redis Master Node restart issue on Sentinel Mode #6971

tolgatuna commented Jul 15, 2021

miguelaeh commented Jul 16, 2021

tolgatuna commented Jul 16, 2021

rafariossaa commented Jul 19, 2021

anessi commented Aug 5, 2021

rafariossaa commented Aug 6, 2021

anessi commented Aug 10, 2021

pablogalegoc commented Aug 11, 2021

ShineSmile commented Aug 17, 2021 •

edited

Loading

github-actions bot commented Sep 2, 2021

anessi commented Sep 2, 2021

ShineSmile commented Sep 6, 2021

anessi commented Sep 6, 2021

pablogalegoc commented Sep 6, 2021

github-actions bot commented Sep 22, 2021

github-actions bot commented Sep 28, 2021

Redis Master Node restart issue on Sentinel Mode #6971

Redis Master Node restart issue on Sentinel Mode #6971

Comments

tolgatuna commented Jul 15, 2021

miguelaeh commented Jul 16, 2021

tolgatuna commented Jul 16, 2021

rafariossaa commented Jul 19, 2021

anessi commented Aug 5, 2021

rafariossaa commented Aug 6, 2021

anessi commented Aug 10, 2021

pablogalegoc commented Aug 11, 2021

ShineSmile commented Aug 17, 2021 • edited Loading

github-actions bot commented Sep 2, 2021

anessi commented Sep 2, 2021

ShineSmile commented Sep 6, 2021

anessi commented Sep 6, 2021

pablogalegoc commented Sep 6, 2021

github-actions bot commented Sep 22, 2021

github-actions bot commented Sep 28, 2021

ShineSmile commented Aug 17, 2021 •

edited

Loading