-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/redis] Sentinel return wrong IP #1682
Comments
Hi, This is strange because the generated config map uses the domain name, not the IP
Could you show the generated config map using kubectl? |
Hmmm, domain name here... Very strange
|
I do not reload redis yet, any tests I can do to track down issue? |
Maybe deploying a new one and see if the address gets changed to IP. Maybe it's something that Redis does automatically |
I deploy new redis with
Same result, IP in config:
|
Another point. If I restart pod/redis2-master-0 it's config updated. However, slaves's sentinels do not:
|
Hi, Thanks for letting us know. I think this will require further investigation. Let me open an internal task. I will let you know when we have more details. |
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback. |
@javsalgar Would this issue suggest it's not wise to rely on using Sentinel mode for HA in a production situation? |
Hi, We still need to investigate how to properly deal with sentinel and the ephemerality of IP addresses. For the time being, until this issue is fixed, I would recommend you sticking to a regular master-slave configuration. We are also working on a redis-cluster chart which has a different failover mechanism and could be more suited for this kind of scenarios. We will let you know when we have more updates on this. |
Hi @baznikin ,
As you can see in the sentinel configuration of one of the slaves, it is pointing to the master pod, that is correct:
Then, I killed the master pod:
And now there is an unstable period where a salve should be promoted to master, checking the logs of one of the slaves they will have the following:
The new master pod is created automatically and its IP will be different from the previous one:
If you exec to the new pod just in the moment it is created, you will see that the sentinel configuration is pointing to itself, to its new ip.
And after some time, if we go to the old master we will see that the sentinel configuration is now pointing to the new master (that is
And now the cluster is stable again. I guess this is the behaviour you were expecting but you didn't give it enough time for the sentinel to update the IPs. |
And now the cluster is stable again. I guess this is the behaviour you
were expecting but you didn't give it enough time for the sentinel to
update the IPs.
Maybe! But at time report was created I watch for wrong configuration for
few hours. I didn't use this chart now and give it a try next time.
ср, 8 апр. 2020 г., 17:53 Miguel Ángel Cabrera Miñagorri <
[email protected]>:
… Hi @baznikin <https://github.com/baznikin> ,
I have been testing what you explained here and it seems to be a temporal
issue. Once you kill the master, there is a time where one of the slaves
needs to be promoted to master. During that time, the sentinel at both
slaves will be pointing to the old master, and if the new master pod has
been created it will point to itself because the hostname in the configmap
is pointing to the pod called master. There is something here to clarify
that is that at this moment, the master will be one of the pods called
slave and the pod called master will be a slave.
Once the cluster reaches a stable state, the sentinel pods start an
auto-reconfiguring process, and after some time they all point to the new
master (that is actually a pod called slave).
Let me illustrate this:
- First deploy of the chart you will have the cluster in a stable
state:
10:43:56 › kgp -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
sentinel-redis-master-0 3/3 Running 0 3m50s 10.244.2.86 aks-agentpool-38805687-vmss000002 <none> <none>
sentinel-redis-slave-0 3/3 Running 2 3m50s 10.244.3.89 aks-agentpool-38805687-vmss000003 <none> <none>
sentinel-redis-slave-1 3/3 Running 0 2m15s 10.244.1.83 aks-agentpool-38805687-vmss000001 <none> <none>
As you can see in the sentinel configuration of one of the slaves, it is
pointing to the master pod, that is correct:
10:44:03 › k exec -it sentinel-redis-master-0 -c sentinel bash
I have no ***@***.***:/$ cat /opt/bitnami/redis-sentinel/etc/sentinel.conf
dir "/tmp"
bind 0.0.0.0
port 26379
sentinel myid 3b9bba815cc15706f7b66f7ef85eefe215cb4c1b
sentinel deny-scripts-reconfig yes
sentinel monitor spt-redis 10.244.2.86 6379 2
.
.
.
Then, I killed the master pod:
10:45:51 › k delete pod sentinel-redis-master-0
pod "sentinel-redis-master-0" deleted
And now there is an unstable period where a salve should be promoted to
master, checking the logs of one of the slaves they will have the following:
1:S 08 Apr 2020 10:41:54.951 # CONFIG REWRITE executed with success.
1:S 08 Apr 2020 10:41:55.265 * Connecting to MASTER 10.244.2.86:6379
1:S 08 Apr 2020 10:41:55.265 * MASTER <-> REPLICA sync started
1:S 08 Apr 2020 10:41:55.266 * Non blocking connect for SYNC fired the event.
1:S 08 Apr 2020 10:41:55.266 * Master replied to PING, replication can continue...
1:S 08 Apr 2020 10:41:55.268 * Trying a partial resynchronization (request e071e2ecae29225177b980a90e8afea809390681:2550).
1:S 08 Apr 2020 10:41:55.269 * Successful partial resynchronization with master.
1:S 08 Apr 2020 10:41:55.269 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
1:S 08 Apr 2020 10:45:55.030 # Connection with master lost.
1:S 08 Apr 2020 10:45:55.030 * Caching the disconnected master state.
1:S 08 Apr 2020 10:45:55.078 * Connecting to MASTER 10.244.2.86:6379
1:S 08 Apr 2020 10:45:55.078 * MASTER <-> REPLICA sync started
1:S 08 Apr 2020 10:45:55.079 # Error condition on socket for SYNC: Connection refused
1:S 08 Apr 2020 10:45:56.081 * Connecting to MASTER 10.244.2.86:6379
1:S 08 Apr 2020 10:45:56.081 * MASTER <-> REPLICA sync started
1:S 08 Apr 2020 10:46:14.230 # Error condition on socket for SYNC: No route to host
1:S 08 Apr 2020 10:46:15.159 * Connecting to MASTER 10.244.2.86:6379
1:S 08 Apr 2020 10:46:15.160 * MASTER <-> REPLICA sync started
1:S 08 Apr 2020 10:46:23.454 # Error condition on socket for SYNC: No route to host
1:S 08 Apr 2020 10:46:24.190 * Connecting to MASTER 10.244.2.86:6379
1:S 08 Apr 2020 10:46:24.191 * MASTER <-> REPLICA sync started
1:S 08 Apr 2020 10:46:27.254 # Error condition on socket for SYNC: No route to host
1:S 08 Apr 2020 10:46:28.208 * Connecting to MASTER 10.244.2.86:6379
1:S 08 Apr 2020 10:46:28.208 * MASTER <-> REPLICA sync started
The new master pod is created automatically and its IP will be different
from the previous one:
10:48:08 › kgp -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
sentinel-redis-master-0 3/3 Running 0 2m15s 10.244.2.87 aks-agentpool-38805687-vmss000002 <none> <none>
sentinel-redis-slave-0 3/3 Running 2 8m 10.244.3.89 aks-agentpool-38805687-vmss000003 <none> <none>
sentinel-redis-slave-1 3/3 Running 0 6m25s 10.244.1.83 aks-agentpool-38805687-vmss000001 <none> <none>
If you exec to the new pod just in the moment it is created, you will see
that the sentinel configuration is pointing to itself, to its new ip.
Now, in one of the slaves, the following will appear, indicating it is now
the master:
1:M 08 Apr 2020 10:48:07.934 * Discarding previously cached master state.
1:M 08 Apr 2020 10:48:07.934 * MASTER MODE enabled (user request from 'id=5 addr=10.244.3.89:39847 fd=10 name=sentinel-ce6ec014-cmd age=357 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=140 qbuf-free=32628 obl=36 oll=0 omem=0 events=r cmd=exec')
1:M 08 Apr 2020 10:48:07.934 # CONFIG REWRITE executed with success.
1:M 08 Apr 2020 10:48:09.553 * Replica 10.244.3.89:6379 asks for synchronization
1:M 08 Apr 2020 10:48:09.554 * Partial resynchronization request from 10.244.3.89:6379 accepted. Sending 437 bytes of backlog starting from offset 50657.
1:M 08 Apr 2020 10:48:21.036 * Replica 10.244.2.87:6379 asks for synchronization
1:M 08 Apr 2020 10:48:21.036 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '567f79774695fcba15dc30133c6a398d59c74b4d', my replication IDs are '38c06c6daab43ea958f6fb2f0d615e8e02376c14' and 'e071e2ecae29225177b980a90e8afea809390681')
1:M 08 Apr 2020 10:48:21.036 * Starting BGSAVE for SYNC with target: disk
1:M 08 Apr 2020 10:48:21.037 * Background saving started by pid 741
741:C 08 Apr 2020 10:48:21.051 * DB saved on disk
741:C 08 Apr 2020 10:48:21.052 * RDB: 10 MB of memory used by copy-on-write
1:M 08 Apr 2020 10:48:21.085 * Background saving terminated with success
1:M 08 Apr 2020 10:48:21.086 * Synchronization with replica 10.244.2.87:6379 succeeded
And after some time, if we go to the old master we will see that the
sentinel configuration is now pointing to the new master (that is
sentinel-redis-slave-1):
10:49:11 › k exec -it sentinel-redis-master-0 -c sentinel bash
I have no ***@***.***:/$ cat /opt/bitnami/redis-sentinel/etc/sentinel.conf
dir "/tmp"
bind 0.0.0.0
port 26379
sentinel myid d25053c91626dbfabd456c4cdeab9bed39ea33fc
sentinel deny-scripts-reconfig yes
sentinel monitor spt-redis 10.244.1.83 6379 2
And now the cluster is stable again. I guess this is the behaviour you
were expecting but you didn't give it enough time for the sentinel to
update the IPs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1682 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHROZVIAWQ53FEOI6FGXATRLRJSPANCNFSM4JSW6KGA>
.
|
Thank you for the confirmation @baznikin !! |
ive also experienced the issue described by @baznikin here when killing the pod named
when in this state querying each sentinel instance with after a
even though not all sentinels instances are aware of all other instances (?), all of them are aware of a majority (majority == 2 in the 1Master+2Replicas cluster im running) at this point, the unfortunatly, i have yet to make this reproducible 100% of the times for reference, here is the output of
would be happy to provide more info / logs |
Hi @albertocsm ,
Regards. |
i experienced same problem this night. One node of my cluster went down and failover starts. I experienced that one sentinel return the old Master ip instead of new one causing problems to other services that use redis |
Hi @rtriveurbana , |
I have noticed that old IPs are not removed from
And here are problems with these IPs:
Looks like sentinel configuration is not configured properly when I (or my cluster (like autoscaler)) delete pods. It is even funnier! 🙈 I spawned a new redis release (standalone, only one pod, in another namespace) and k8s gave him I think that we need something which is able to remove old IPs from |
Hi @tomislater , |
@miguelaeh hey, I have added a comment here: #5418 (comment) Looks that we should use:
I am going to debug this further. |
I have a solution and it seems working 🤔 I will propose PR soon. |
Hi @tomislater , |
Unfortunately, this issue was created a long time ago and although there is an internal task to fix it, it was not prioritized as something to address in the short/mid term. It's not a technical reason but something related to the capacity since we're a small team. Being said that, contributions via PRs are more than welcome in both repositories (containers and charts). Just in case you would like to contribute. During this time, there are several releases of this asset and it's possible the issue has gone as part of other changes. If that's not the case and you are still experiencing this issue, please feel free to reopen it and we will re-evaluate it. |
Which chart:
bitnami/redis version 9.5.5
Description
Sentinel do not update his config so we have stale IP addresses
Steps to reproduce the issue:
I setup redis and stolon in same namespace and play with them a while. When I tried to actually use redis I found sentinel gives me address of postrges pod! As far as I understand sentinel compiles its config upon pod start and beleave addresses do not change.
(paste your output here)
(paste your output here)
The text was updated successfully, but these errors were encountered: