You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are serveral slave instances crashed, and log shows:
[3243] 29 Feb 17:56:11 # MASTER time out: no data nor PING received...
[3243] 29 Feb 17:56:11 * Connecting to MASTER...
[3243] 29 Feb 17:56:11 * MASTER <-> SLAVE sync started
[3243] 29 Feb 17:56:11 * Non blocking connect for SYNC fired the event.
[3243] 29 Feb 17:56:25 * MASTER <-> SLAVE sync: receiving 448824372 bytes from master
[3243] 29 Feb 18:04:33 * MASTER <-> SLAVE sync: Loading DB in memory
[3243] 29 Feb 18:04:40 # === REDIS BUG REPORT START: Cut & paste starting from here ===
[3243] 29 Feb 18:04:40 # !!! Software Failure. Press left mouse button to continue
[3243] 29 Feb 18:04:40 # Guru Meditation: "Unknown RDB encoding type" #rdb.c:648
[3243] 29 Feb 18:04:40 # (forcing SIGSEGV in order to print the stack trace)
[3243] 29 Feb 18:04:40 # Redis 2.4.6 crashed by signal: 11
[3243] 29 Feb 18:04:40 # Failed assertion: (:0)
[3243] 29 Feb 18:04:40 # --- STACK TRACE
[3243] 29 Feb 18:04:40 # /usr/local/bin/redis-server(_redisPanic+0x62) [0x7f58ea55c952]
[3243] 29 Feb 18:04:40 # /usr/local/bin/redis-server(_redisPanic+0x62) [0x7f58ea55c952]
[3243] 29 Feb 18:04:40 # /usr/local/bin/redis-server(rdbGenericLoadStringObject+0x7d) [0x7f58ea547b4d]
[3243] 29 Feb 18:04:40 # /usr/local/bin/redis-server(rdbLoadObject+0x3b9) [0x7f58ea5491c9]
[3243] 29 Feb 18:04:40 # /usr/local/bin/redis-server(rdbLoad+0x15e) [0x7f58ea5496ee]
[3243] 29 Feb 18:04:40 # /usr/local/bin/redis-server(readSyncBulkPayload+0x1cc) [0x7f58ea54662c]
[3243] 29 Feb 18:04:40 # /usr/local/bin/redis-server(aeProcessEvents+0x168) [0x7f58ea534af8]
[3243] 29 Feb 18:04:40 # /usr/local/bin/redis-server(aeMain+0x2e) [0x7f58ea534d0e]
[3243] 29 Feb 18:04:40 # /usr/local/bin/redis-server(main+0x1e8) [0x7f58ea538e95]
....
This happened when someone was adjusting the network which caused the link unstable, as
you can see, transfer 440MB dump took more than 8 minutes.
After the network got stable and we restarted the slave, it took less than 8 seconds and nothing went wrong:
[18680] 29 Feb 18:33:39 * SLAVE OF 10.110.24.15:7701 enabled (user request)
[18680] 29 Feb 18:33:41 * Connecting to MASTER...
[18680] 29 Feb 18:33:41 * MASTER <-> SLAVE sync started
[18680] 29 Feb 18:33:41 * Non blocking connect for SYNC fired the event.
[18680] 29 Feb 18:33:52 * MASTER <-> SLAVE sync: receiving 448826470 bytes from master
[18680] 29 Feb 18:33:59 * MASTER <-> SLAVE sync: Loading DB in memory
[18680] 29 Feb 18:34:09 * MASTER <-> SLAVE sync: Finished with success
...
I was thinking that append a MD5 sum of dump.rdb after it is created, which won't break compatibility.
The text was updated successfully, but these errors were encountered:
Hello Jokea, the checksum looks like a good idea indeed, but I wonder if we can be sure that the file was corrupted because of the transferring, and was not generated in the wrong way for some reason.
Btw you could say I guess, that having the checksum may already provide a reply about this...
eh... I should've make a backup of the dump file generated by the master, then we can see where the problem is. Now this is really hard to reproduce.
There are also 3 slaves that stopped replication while the master <-> slave link is up. We noticed the desync because the number of keys stopped changing as the master:
Since we already fixed the protocol desync bug in issue #141, I think this was caused by the network transfer too. We have 7 slaves attached, and these 3 desync slaves are in a different location as the master, the network adjustment only affected the link between the two locations.
Hi @jokea, I went forward and added checksum directly in the RDB format itself, so this will protect replication but also any other uses of the RDB format, especially just loading an RDB file after a restart. For now the changes are in the 'rdbcksum' branch but will be merged into unstable and 2.6 soon. Thanks! Closing.
Hi,
There are serveral slave instances crashed, and log shows:
This happened when someone was adjusting the network which caused the link unstable, as
you can see, transfer 440MB dump took more than 8 minutes.
After the network got stable and we restarted the slave, it took less than 8 seconds and nothing went wrong:
I was thinking that append a MD5 sum of dump.rdb after it is created, which won't break compatibility.
The text was updated successfully, but these errors were encountered: