[redis] Sentinel from master does not recover itself #3562

EduardoFLima · 2020-08-31T07:44:03Z

Which chart:
bitnami/redis, 10.7.16

Describe the bug
We're using Redis Helm Chart topology with one master and two slaves.
When the sentinel container inside the pod which contains the master crashes, it is not able to recover itself.

In the logs, we can see the error below:

sentinel known-sentinel mymaster 172.17.0.10 26379 2156b04daef7355ec8796e6f493a1d0285f1adc5

*** FATAL CONFIG FILE ERROR (Redis 6.0.6) ***
Reading the configuration file, at line 22
>>> 'sentinel known-sentinel mymaster 172.17.0.10 26379 2156b04daef7355ec8796e6f493a1d0285f1adc5'
Wrong hostname or port for sentinel.
sentinel known-sentinel mymaster 172.17.0.9 26379 101ee1d442b7f7db844587d150c4accfe371e529

To Reproduce
Steps to reproduce the behavior:

Install the helm with

helm install my-redis bitnami/redis --namespace default -f values.yaml

Where values.yaml is:

image:
  tag: 6.0.6

sentinel:
  enabled: true
  image:
    tag: 6.0.6
  downAfterMilliseconds: 5000
  failoverTimeout: 10000

usePassword: true
password: password

master:
  resources:
    requests:
      memory: 200Mi
      cpu: 100m
    limits:
      cpu: 1000m
      memory: 1Gi
  persistence:
    enabled: false

slave:
  resources:
    requests:
      memory: 200Mi
      cpu: 100m
    limits:
      cpu: 1000m
      memory: 1Gi

  persistence:
    enabled: false

Wait until pods are up and running and test redis.
Redis container in the master pod:

kubectl exec -it my-redis-master-0 -- redis-cli -a password info replication

# Replication
role:master
connected_slaves:2
slave0:ip=172.17.0.3,port=6379,state=online,offset=14904,lag=0
slave1:ip=172.17.0.10,port=6379,state=online,offset=14769,lag=1
master_replid:22b4a8cc55fd70d1896519429f46affb9718ba95
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:14904
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:14904

Sentinel container in the master pod:

kubectl exec -it my-redis-master-0 -c sentinel -- redis-cli -p 26379 -a password info sentinel

# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.17.0.9:6379,slaves=2,sentinels=3

Master pod ip:

kubectl describe pod my-redis-master-0 | grep IP:

IP:           172.17.0.9

Shutdown the sentinel inside the pod which contains one of the slaves:

    kubectl exec -it my-redis-slave-0 -c sentinel -- redis-cli -p 26379 -a password shutdown

Note that the my-redis-slave-0 sentinel crashes and it is able to recover itself.
Shutdown the sentinel inside the pod which contains the master:

    kubectl exec -it my-redis-master-0 -c sentinel -- redis-cli -p 26379 -a password shutdown

Note that the my-redis-master-0 sentinel crashes and keep trying to recover itself but it is not able to (the pod status keep changing between CrashLoopBackOff and Error).

In the log of the sentinel container of the my-redis-master-0 pod:

sentinel known-sentinel mymaster 172.17.0.10 26379 2156b04daef7355ec8796e6f493a1d0285f1adc5

*** FATAL CONFIG FILE ERROR (Redis 6.0.6) ***
Reading the configuration file, at line 22
>>> 'sentinel known-sentinel mymaster 172.17.0.10 26379 2156b04daef7355ec8796e6f493a1d0285f1adc5'
Wrong hostname or port for sentinel.
sentinel known-sentinel mymaster 172.17.0.9 26379 101ee1d442b7f7db844587d150c4accfe371e529

In the log of the sentinel container of the my-redis-slave-0 pod:

1:X 28 Aug 2020 06:57:03.971 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 28 Aug 2020 06:57:03.971 # Redis version=6.0.6, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 28 Aug 2020 06:57:03.971 # Configuration loaded
1:X 28 Aug 2020 06:57:03.972 * Running mode=sentinel, port=26379.
1:X 28 Aug 2020 06:57:03.973 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 28 Aug 2020 06:57:03.973 # Sentinel ID is 757f43b7ef7152735bb37b5b6f7aceed2bdc6bf5
1:X 28 Aug 2020 06:57:03.973 # +monitor master mymaster 172.17.0.9 6379 quorum 2
1:X 28 Aug 2020 06:58:00.966 # +sdown sentinel 101ee1d442b7f7db844587d150c4accfe371e529 172.17.0.9 26379 @ mymaster 172.17.0.9 6379

From the logs we understood that the sentinel of the master pod is starting and trying to connect to one of the sentinel of the slaves (172.17.0.10) and is not able to for some reason.

Expected behavior
It is expected that the sentinel container inside the master pod to be able to recover itself just as the one in the slave pod were able to (step 4 above).

Version of Helm and Kubernetes:

Output of helm version:

version.BuildInfo{Version:"v3.3.0", GitCommit:"8a4aeec08d67a7b84472007529e8097ec3742105", GitTreeState:"dirty", GoVersion:"go1.14.7"}

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:20:10Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:12:17Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}

The text was updated successfully, but these errors were encountered:

javsalgar · 2020-08-31T11:04:09Z

Hi,

We have plans on improving our current Redis Sentinel configuration as we detected some failover issues. We will update this ticket as soon as we have more news. Thank you very much for reporting!!

pascalrimann · 2021-11-02T10:51:21Z

Is there any solution already on this Topic? We are currently facing the same Issue.

javsalgar · 2021-11-04T10:59:05Z

Hi,

Could you share the logs of the issue to see if there's any difference with the ones shared by the OP? Is it something you can easily reproduce by removing the pods? We've been doing improvements to the failover mechanism and we would like to understand what caused the issue this time.

carrodher · 2022-10-20T13:23:02Z

Unfortunately, this issue was created a long time ago and although there is an internal task to fix it, it was not prioritized as something to address in the short/mid term. It's not a technical reason but something related to the capacity since we're a small team.

Being said that, contributions via PRs are more than welcome in both repositories (containers and charts). Just in case you would like to contribute.

During this time, there are several releases of this asset and it's possible the issue has gone as part of other changes. If that's not the case and you are still experiencing this issue, please feel free to reopen it and we will re-evaluate it.

javsalgar added the on-hold Issues or Pull Requests with this label will never be considered stale label Aug 31, 2020

rafariossaa mentioned this issue Sep 11, 2020

[bitnami/redis] Removes master/slave when using sentinel #3658

Merged

4 tasks

carrodher closed this as completed Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[redis] Sentinel from master does not recover itself #3562

[redis] Sentinel from master does not recover itself #3562

EduardoFLima commented Aug 31, 2020

javsalgar commented Aug 31, 2020

pascalrimann commented Nov 2, 2021

javsalgar commented Nov 4, 2021

carrodher commented Oct 20, 2022

[redis] Sentinel from master does not recover itself #3562

[redis] Sentinel from master does not recover itself #3562

Comments

EduardoFLima commented Aug 31, 2020

javsalgar commented Aug 31, 2020

pascalrimann commented Nov 2, 2021

javsalgar commented Nov 4, 2021

carrodher commented Oct 20, 2022