Infinite re-election loop on leader node rejoin #2837

ybubnov · 2017-03-28T12:43:33Z

`consul version` for Server

Consul v0.7.0
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

`consul info` for Server

Server:

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 1
	services = 1
build:
	prerelease = 
	revision = 'a189091
	version = 0.7.0
consul:
	bootstrap = false
	known_datacenters = 1
	leader = false
	leader_addr = 10.2.2.1:8300
	server = true
raft:
	applied_index = 334
	commit_index = 334
	fsm_pending = 0
	last_contact = 90.3972ms
	last_log_index = 334
	last_log_term = 23
	last_snapshot_index = 0
	last_snapshot_term = 0
	latest_configuration = [{Suffrage:Voter ID:10.2.2.1:8300 Address:10.2.2.1:8300} {Suffrage:Voter ID:10.2.3.1:8300 Address:10.2.3.1:8300} {Suffrage:Voter ID:10.2.1.1:8300 Address:10.2.1.1:8300}]
	latest_configuration_index = 331
	num_peers = 2
	protocol_version = 1
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Follower
	term = 23
runtime:
	arch = amd64
	cpu_count = 1
	goroutines = 66
	max_procs = 4
	os = linux
	version = go1.6.3
serf_lan:
	encrypted = false
	event_queue = 0
	event_time = 5
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 1
	member_time = 6
	members = 4
	query_queue = 0
	query_time = 1
serf_wan:
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1

Operating system and Environment details

Linux node1 4.4.19-29.55.amzn1.x86_64 Help text is too wide (> 80 chars) #1 SMP Mon Aug 29 23:29:40 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Each Consul server is running within a Docker container with the following parameters:

docker run \
  --memory=128m \
  --cpu-shares=128 \
  --net=custom-net-1 \
  -e GOMAXPROCS=4 \
  --publish=8500:8500 \
  --publish=10.2.2.1:8400:8400 \
  --publish=10.2.2.1:8300-8302:8300-8302 \
  --publish=10.2.2.1:8300-8302:8300-8302/udp \
  custom-consul-image:0.7.0 \
  # Consul server parameters ...

Note, Docker container does not mount /var/consul/data so leave from the cluster purges the raft database and all data.

Server configuration of the Consul. The server configuration file from /etc/consul.d/config.json and health-check configuration from /etc/consul.d/node.json (command have been modified) respectively:

{
  "data_dir": "/var/consul/data",
  "ui_dir": "/var/consul/ui",
  "disable_update_check": true
}

{
  "checks": [
    {
      "id": "node-health",
      "name": "Node health checks",
      "docker_container_id": "agent",
      "shell": "/bin/sh",
      "script": "/usr/bin/healthcheck",
      "interval":"30s",
      "timeout":"15s"
    }
  ]
}

Description of the Issue (and unexpected/desired result)

After the leave of Consul leader from the cluster and subsequent rejoin the server with the same IP replicated logs are keep getting rejected. It is expected that replicated logs are committed to the joined server of the cluster.

Reproduction steps

1. Create a three-node cluster with an identical configuration on each node:

consul agent \
  -config-dir=/etc/consul.d \
  -client=0.0.0.0 \
  -node="${ipsubnet}/24" \
  -log-level=debug \
  -retry-interval 5s \
  -advertise "${ipaddress}" \
  -dc dc1-3-node \
  -retry-join 10.2.1.1 -retry-join 10.2.2.1 -retry-join 10.2.3.1 \
  -server \
  -bootstrap-expect 3

Where:

Variable ${ipsubnet} takes one of: 10.2.1.0/24, 10.2.2.0/24, 10.2.3.0/24.
Variable ${ipaddress} takes one of: 10.2.1.1, 10.2.2.1, 10.2.3.1 respectively.

2. Wait until the cluster finishes bootstrapping stage (in my case node with IP `10.2.2.1` takes the leadership of the cluster and starts replication routines):

2017/03/28 10:34:21 [INFO] raft: Node at 10.2.1.1:8300 [Leader] entering Leader state
2017/03/28 10:34:21 [INFO] raft: Added peer 10.2.2.1:8300, starting replication
2017/03/28 10:34:21 [INFO] raft: Added peer 10.2.3.1:8300, starting replication
2017/03/28 10:34:21 [INFO] consul: cluster leadership acquired
2017/03/28 10:34:21 [INFO] consul: New leader elected: 10.2.1.0/24,172.28.5.71,c2bd8b49b4ae

3. Gracefully delete leader node from the Consul cluster:

[[email protected]]$ consul leave

Logs from the former Consul leader:

2017/03/28 10:39:37 [INFO] raft: Removed ourself, transitioning to follower
2017/03/28 10:39:37 [INFO] raft: Node at 10.2.1.1:8300 [Follower] entering Follower state (Leader: "")
2017/03/28 10:39:37 [INFO] consul: cluster leadership lost
...
2017/03/28 10:39:37 [INFO] agent: shutdown complete
2017/03/28 10:39:37 [DEBUG] http: Shutting down http server (0.0.0.0:8500)

4. Wait until a new leader is reelected:

2017/03/28 10:39:48 [INFO] raft: Election won. Tally: 2
2017/03/28 10:39:48 [INFO] raft: Node at 10.2.3.1:8300 [Leader] entering Leader state
2017/03/28 10:39:48 [INFO] raft: Added peer 10.2.2.1:8300, starting replication
2017/03/28 10:39:48 [INFO] consul: cluster leadership acquired
2017/03/28 10:39:48 [INFO] consul: New leader elected: 10.2.3.0/24,172.28.5.211,808195027ade
2017/03/28 10:39:48 [INFO] raft: pipelining replication to peer {Voter 10.2.2.1:8300 10.2.2.1:8300}

5. Join detached member to the Consul cluster back (with the same IP address and `--bootstrap-expect` set to `3`):

2017/03/28 11:06:06 [INFO] serf: EventMemberJoin: 10.2.2.0/24,172.28.5.16,19883068ab3c 10.2.2.1
2017/03/28 11:06:06 [INFO] serf: EventMemberJoin: 10.2.3.0/24,172.28.5.211,808195027ade 10.2.3.1
2017/03/28 11:06:06 [INFO] consul: Adding LAN server 10.2.2.0/24,172.28.5.16,19883068ab3c (Addr: tcp/10.2.2.1:8300) (DC: dc1-3-santa_clara)
2017/03/28 11:06:06 [DEBUG] memberlist: Initiating push/pull sync with: 10.2.3.1:8301
2017/03/28 11:06:06 [INFO] agent: (LAN) joined: 3 Err: <nil>
2017/03/28 11:06:06 [INFO] agent: Join completed. Synced with 3 initial agents
2017/03/28 11:06:06 [INFO] consul: Existing Raft peers reported by 10.2.2.0/24,172.28.5.16,19883068ab3c, disabling bootstrap mode

6. Notice that due to the heart-beat timeout expiration caused by accumulation of uncommitted logs on rejoined member leader re-election never ends:

$ grep -i 'log not found' 10-2-1-1-consul-after.log
2017/03/28 11:06:06 [WARN] raft: Failed to get previous log: 247 log not found (last: 0)
2017/03/28 11:06:48 [WARN] raft: Failed to get previous log: 261 log not found (last: 260)
2017/03/28 11:07:48 [WARN] raft: Failed to get previous log: 269 log not found (last: 268)
2017/03/28 11:08:48 [WARN] raft: Failed to get previous log: 278 log not found (last: 277)
...
2017/03/28 11:14:07 [WARN] raft: Failed to get previous log: 331 log not found (last: 330)
2017/03/28 11:15:07 [WARN] raft: Failed to get previous log: 342 log not found (last: 341)
2017/03/28 11:16:07 [WARN] raft: Failed to get previous log: 353 log not found (last: 352)
...
2017/03/28 11:28:44 [WARN] raft: Failed to get previous log: 472 log not found (last: 465)
2017/03/28 11:28:44 [WARN] raft: Failed to get previous log: 476 log not found (last: 475)

$ grep -i heartbeat 10-2-1-1-consul-after.log
2017/03/28 11:11:59 [WARN] raft: Heartbeat timeout from "10.2.3.1:8300" reached, starting election
2017/03/28 11:13:08 [WARN] raft: Heartbeat timeout from "10.2.2.1:8300" reached, starting election
2017/03/28 11:19:15 [WARN] raft: Heartbeat timeout from "10.2.2.1:8300" reached, starting election
2017/03/28 11:21:27 [WARN] raft: Heartbeat timeout from "10.2.3.1:8300" reached, starting election
2017/03/28 11:23:32 [WARN] raft: Heartbeat timeout from "10.2.3.1:8300" reached, starting election
2017/03/28 11:27:41 [WARN] raft: Heartbeat timeout from "10.2.3.1:8300" reached, starting election

According to my limited understanding, only event of the logs commitment update the leader heart-beat timer on the followers node. An expired timer causes the leader re-election over and over.

Log Fragments

Logs from the server before the leave from the cluster 10-2-1-1-consul-before.log.
Logs from the server after the rejoin to the cluster: 10-2-1-1-consul-after.log.
Logs from another server node: 10-2-3-1-consul.log.

Note, I validated this scenario on 0.6.4 and 0.7.1 versions as well and always able to reproduce.

The text was updated successfully, but these errors were encountered:

slackpad · 2017-05-01T23:28:19Z

Hi @ybubnov the log warnings are actually OK (the leader will send older logs or a snapshot). I think what's happening here as that the server that's coming back is replaying the logs and removing itself, getting stuck in a loop. If you don't do the leave it should work ok just to restart the server, but we will have to think about the right way to fix this.

Mike-Dunton · 2017-09-28T19:33:01Z

Is there any solution to this? Can't get the server that was reset to join back to the cluster.

ChipV223 · 2021-04-09T16:49:37Z

Hi all!

I just tried running the same repro steps that @ybubnov listed in the original response using Consul v1.9.4 and I was not able to get the same behavior. Currently when the original leader rejoins the cluster, it's able to do so successfully with no leader re-election loop.

I'll go ahead and close this for now, but do reach out if you are able to reproduce this using one of the later releases of Consul!

ybubnov changed the title ~~Infinite reelection loop on leader node rejoin~~ Infinite re-election loop on leader node rejoin Mar 28, 2017

slackpad added the type/bug Feature does not function as expected label May 1, 2017

slackpad added the theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization label May 25, 2017

slackpad added this to the Unplanned milestone Dec 19, 2017

slackpad mentioned this issue Jan 9, 2018

Consul write api hangs for half hour before return rpc error #3738

Open

ChipV223 closed this as completed Apr 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite re-election loop on leader node rejoin #2837

Infinite re-election loop on leader node rejoin #2837

ybubnov commented Mar 28, 2017 •

edited

Loading

slackpad commented May 1, 2017

Mike-Dunton commented Sep 28, 2017 •

edited

Loading

ChipV223 commented Apr 9, 2021

Infinite re-election loop on leader node rejoin #2837

Infinite re-election loop on leader node rejoin #2837

Comments

ybubnov commented Mar 28, 2017 • edited Loading

consul version for Server

consul info for Server

Operating system and Environment details

Description of the Issue (and unexpected/desired result)

Reproduction steps

1. Create a three-node cluster with an identical configuration on each node:

2. Wait until the cluster finishes bootstrapping stage (in my case node with IP 10.2.2.1 takes the leadership of the cluster and starts replication routines):

3. Gracefully delete leader node from the Consul cluster:

4. Wait until a new leader is reelected:

5. Join detached member to the Consul cluster back (with the same IP address and --bootstrap-expect set to 3):

6. Notice that due to the heart-beat timeout expiration caused by accumulation of uncommitted logs on rejoined member leader re-election never ends:

Log Fragments

slackpad commented May 1, 2017

Mike-Dunton commented Sep 28, 2017 • edited Loading

ChipV223 commented Apr 9, 2021

ybubnov commented Mar 28, 2017 •

edited

Loading

`consul version` for Server

`consul info` for Server

2. Wait until the cluster finishes bootstrapping stage (in my case node with IP `10.2.2.1` takes the leadership of the cluster and starts replication routines):

5. Join detached member to the Consul cluster back (with the same IP address and `--bootstrap-expect` set to `3`):

Mike-Dunton commented Sep 28, 2017 •

edited

Loading