Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis Master Node restart issue on Sentinel Mode #6971

Closed
tolgatuna opened this issue Jul 15, 2021 · 15 comments
Closed

Redis Master Node restart issue on Sentinel Mode #6971

tolgatuna opened this issue Jul 15, 2021 · 15 comments
Labels
stale 15 days without activity

Comments

@tolgatuna
Copy link

Which chart:
redis

Describe the bug
Bug is related with sentinel mode. I enabled with 'sentinel.enabled: true'. If you kill the master node in that mode, it starts to try rerun itself again and again in a loop.

To Reproduce
Steps to reproduce the behavior:

  1. enable sentinel mode.
  2. Do not configure anything else.
  3. Run it
  4. Kill the master redis pod (0) manually.

Expected behavior
You will see master pod will try to up itself. But you will take error "0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims."

Version of Helm and Kubernetes:

  • Output of helm version:
version.BuildInfo{Version:"v3.6.2", GitCommit:"ee407bdf364942bcb8e8c665f82e15aa28009b71", GitTreeState:"dirty", GoVersion:"go1.16.5"}
  • Output of kubectl version:
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:15:20Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
@miguelaeh
Copy link
Contributor

Hi @tolgatuna ,

1 pod has unbound immediate PersistentVolumeClaims.

This error is usually not related to the Helm Chart but to your cluster dynamic volume provision. I would recommend you to re check the storage class you set (or the default one in case you did not change it) and check why that PVC is not matched.

@tolgatuna
Copy link
Author

Hi @miguelaeh

ame:            pvc-be51fd93-4b37-4f03-857d-753888789730
Labels:          <none>
Annotations:     docker.io/hostpath: /var/lib/k8s-pvs/redis-data-caphelmchart-redis-node-0/pvc-be51fd93-4b37-4f03-857d-753888789730
                 pv.kubernetes.io/provisioned-by: docker.io/hostpath
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    hostpath
Status:          Bound
Claim:           siscap-dev/redis-data-caphelmchart-redis-node-0
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        8Gi
Node Affinity:   <none>
Message:         
Source:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/k8s-pvs/redis-data-caphelmchart-redis-node-0/pvc-be51fd93-4b37-4f03-857d-753888789730
    HostPathType:  
Events:            <none>

Here is details for my pv. By the way last time it was giving error as "0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims." But i realised that it was sometimes giving error :

Readiness probe errored: rpc error: code = Unknown desc = container not running (0250f8a13618e031841839217c348b8fe4c17b44c6f77a9cf712621f0cedfb9f)
Liveness probe errored: rpc error: code = Unknown desc = container not running (0250f8a13618e031841839217c348b8fe4c17b44c6f77a9cf712621f0cedfb9f)

And inside the pot only log is :

Could not connect to Redis at 10.1.5.230:26379: No route to host

@rafariossaa
Copy link
Contributor

Hi,
I deployed it in minikube and these are the details:

$ kubectl describe pvc redis-data-myredis-node-0 
Name:          redis-data-myredis-node-0
Namespace:     default
StorageClass:  standard
Status:        Bound
Volume:        pvc-69f01c32-75bd-44e2-84ca-caca508328d3
Labels:        app.kubernetes.io/component=node
             app.kubernetes.io/instance=myredis
             app.kubernetes.io/name=redis
Annotations:   pv.kubernetes.io/bind-completed: yes
             pv.kubernetes.io/bound-by-controller: yes
             volume.beta.kubernetes.io/storage-provisioner: k8s.io/minikube-hostpath
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      8Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       myredis-node-0

What cached my attention from yours is that you don't have any labels. Could you delete the PVCs and try again ?

@anessi
Copy link
Contributor

anessi commented Aug 5, 2021

We have also notice this problem, even with persistence disabled.

master:
  persistence:
    enabled: false
  
replica:
  persistence:
    enabled: false

Seems this appeared on a recent version only as older versions (e.g. chart version 14.1.1 which is using redis:6.2.3-debian-10-r0 and redis-sentinel:6.2.2-debian-10-r12) work fine.

It's not able to elect a new master and loops forever:

1:X 04 Aug 2021 13:43:45.360 # +new-epoch 36
1:X 04 Aug 2021 13:43:45.360 # +try-failover master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:43:45.370 # +vote-for-leader a39bd657b098ce60ae7f34875ebc53c582b05d05 36
1:X 04 Aug 2021 13:43:45.400 # fb5d1f32fa4fa1d6a01a5622bdc570b60ebbeac0 voted for a39bd657b098ce60ae7f34875ebc53c582b05d05 36
1:X 04 Aug 2021 13:43:45.432 # +elected-leader master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:43:45.432 # +failover-state-select-slave master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:43:45.503 # -failover-abort-no-good-slave master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:43:45.588 # Next failover delay: I will not start a failover before Wed Aug  4 13:44:21 2021
1:X 04 Aug 2021 13:44:21.576 # +new-epoch 37
1:X 04 Aug 2021 13:44:21.576 # +try-failover master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:44:21.591 # +vote-for-leader a39bd657b098ce60ae7f34875ebc53c582b05d05 37
1:X 04 Aug 2021 13:44:21.647 # fb5d1f32fa4fa1d6a01a5622bdc570b60ebbeac0 voted for a39bd657b098ce60ae7f34875ebc53c582b05d05 37
1:X 04 Aug 2021 13:44:21.663 # +elected-leader master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:44:21.663 # +failover-state-select-slave master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:44:21.734 # -failover-abort-no-good-slave master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:44:21.825 # Next failover delay: I will not start a failover before Wed Aug  4 13:44:58 2021
1:X 04 Aug 2021 13:44:58.185 # +new-epoch 38
1:X 04 Aug 2021 13:44:58.204 # +vote-for-leader fb5d1f32fa4fa1d6a01a5622bdc570b60ebbeac0 38
1:X 04 Aug 2021 13:44:58.257 # Next failover delay: I will not start a failover before Wed Aug  4 13:45:34 2021
1:X 04 Aug 2021 13:45:34.329 # +new-epoch 39
1:X 04 Aug 2021 13:45:34.350 # +vote-for-leader fb5d1f32fa4fa1d6a01a5622bdc570b60ebbeac0 39
1:X 04 Aug 2021 13:45:34.350 # Next failover delay: I will not start a failover before Wed Aug  4 13:46:11 2021
1:X 04 Aug 2021 13:46:11.247 # +new-epoch 40
1:X 04 Aug 2021 13:46:11.247 # +try-failover master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:46:11.256 # +vote-for-leader a39bd657b098ce60ae7f34875ebc53c582b05d05 40
1:X 04 Aug 2021 13:46:11.257 # fb5d1f32fa4fa1d6a01a5622bdc570b60ebbeac0 voted for fb5d1f32fa4fa1d6a01a5622bdc570b60ebbeac0 40
1:X 04 Aug 2021 13:46:21.501 # -failover-abort-not-elected master mymaster 10.42.5.249 6379
1:X 04 Aug 2021 13:46:21.560 # Next failover delay: I will not start a failover before Wed Aug  4 13:46:47 2021

@rafariossaa
Copy link
Contributor

Could you try with the latest version: 14.8.8 that uses redis 6.2.5 ?

@anessi
Copy link
Contributor

anessi commented Aug 10, 2021

With the 14.8.8 version of the Redis Helm chart the behavior is the same as with 14.7.2, so that problem is still there.

@pablogalegoc
Copy link
Contributor

This should be the same issue reported in #7181 and there's a proposed fix in #7182

@ShineSmile
Copy link

ShineSmile commented Aug 17, 2021

Comment from @bluecrabs007 at Jun 10 in #6165 helps me to solve my problem.

@github-actions
Copy link

github-actions bot commented Sep 2, 2021

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label Sep 2, 2021
@anessi
Copy link
Contributor

anessi commented Sep 2, 2021

For us the changes done in #7278 (version 15.0.1 + an additional fix in 15.0.4) solves the issues. It works with or without the changes mentioned above by @ShineSmile (#6165 (comment)).

@github-actions github-actions bot removed the stale 15 days without activity label Sep 3, 2021
@ShineSmile
Copy link

I found that if we enable istio sidecar with label istio-injection=enabled, the master node won't restart successfully after delete manually .

@anessi
Copy link
Contributor

anessi commented Sep 6, 2021

In our setup the Redis instance is also running on an ISTIO enabled namespace and works just fine with version 15.0.4. After deleting the master node manually it recovers without problems. Just tested again. We have storage disabled, but that should not make a difference.

@pablogalegoc
Copy link
Contributor

@ShineSmile for the sentinel with Istio issue I'd recommend opening another issue. Just so you know, we don't test our charts with Istio so we can't officially support it but we'll be glad to help if needed.

@github-actions
Copy link

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label Sep 22, 2021
@github-actions
Copy link

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale 15 days without activity
Projects
None yet
Development

No branches or pull requests

6 participants