Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite re-election loop on leader node rejoin #2837

Closed
ybubnov opened this issue Mar 28, 2017 · 3 comments
Closed

Infinite re-election loop on leader node rejoin #2837

ybubnov opened this issue Mar 28, 2017 · 3 comments
Labels
theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization type/bug Feature does not function as expected
Milestone

Comments

@ybubnov
Copy link

ybubnov commented Mar 28, 2017

consul version for Server

Consul v0.7.0
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

consul info for Server

Server:

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 1
	services = 1
build:
	prerelease = 
	revision = 'a189091
	version = 0.7.0
consul:
	bootstrap = false
	known_datacenters = 1
	leader = false
	leader_addr = 10.2.2.1:8300
	server = true
raft:
	applied_index = 334
	commit_index = 334
	fsm_pending = 0
	last_contact = 90.3972ms
	last_log_index = 334
	last_log_term = 23
	last_snapshot_index = 0
	last_snapshot_term = 0
	latest_configuration = [{Suffrage:Voter ID:10.2.2.1:8300 Address:10.2.2.1:8300} {Suffrage:Voter ID:10.2.3.1:8300 Address:10.2.3.1:8300} {Suffrage:Voter ID:10.2.1.1:8300 Address:10.2.1.1:8300}]
	latest_configuration_index = 331
	num_peers = 2
	protocol_version = 1
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Follower
	term = 23
runtime:
	arch = amd64
	cpu_count = 1
	goroutines = 66
	max_procs = 4
	os = linux
	version = go1.6.3
serf_lan:
	encrypted = false
	event_queue = 0
	event_time = 5
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 1
	member_time = 6
	members = 4
	query_queue = 0
	query_time = 1
serf_wan:
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1

Operating system and Environment details

  • Linux node1 4.4.19-29.55.amzn1.x86_64 Help text is too wide (> 80 chars) #1 SMP Mon Aug 29 23:29:40 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
  • Each Consul server is running within a Docker container with the following parameters:
docker run \
  --memory=128m \
  --cpu-shares=128 \
  --net=custom-net-1 \
  -e GOMAXPROCS=4 \
  --publish=8500:8500 \
  --publish=10.2.2.1:8400:8400 \
  --publish=10.2.2.1:8300-8302:8300-8302 \
  --publish=10.2.2.1:8300-8302:8300-8302/udp \
  custom-consul-image:0.7.0 \
  # Consul server parameters ...

Note, Docker container does not mount /var/consul/data so leave from the cluster purges the raft database and all data.

  • Server configuration of the Consul. The server configuration file from /etc/consul.d/config.json and health-check configuration from /etc/consul.d/node.json (command have been modified) respectively:
{
  "data_dir": "/var/consul/data",
  "ui_dir": "/var/consul/ui",
  "disable_update_check": true
}
{
  "checks": [
    {
      "id": "node-health",
      "name": "Node health checks",
      "docker_container_id": "agent",
      "shell": "/bin/sh",
      "script": "/usr/bin/healthcheck",
      "interval":"30s",
      "timeout":"15s"
    }
  ]
}

Description of the Issue (and unexpected/desired result)

After the leave of Consul leader from the cluster and subsequent rejoin the server with the same IP replicated logs are keep getting rejected. It is expected that replicated logs are committed to the joined server of the cluster.

Reproduction steps

1. Create a three-node cluster with an identical configuration on each node:

consul agent \
  -config-dir=/etc/consul.d \
  -client=0.0.0.0 \
  -node="${ipsubnet}/24" \
  -log-level=debug \
  -retry-interval 5s \
  -advertise "${ipaddress}" \
  -dc dc1-3-node \
  -retry-join 10.2.1.1 -retry-join 10.2.2.1 -retry-join 10.2.3.1 \
  -server \
  -bootstrap-expect 3

Where:

  • Variable ${ipsubnet} takes one of: 10.2.1.0/24, 10.2.2.0/24, 10.2.3.0/24.
  • Variable ${ipaddress} takes one of: 10.2.1.1, 10.2.2.1, 10.2.3.1 respectively.

2. Wait until the cluster finishes bootstrapping stage (in my case node with IP 10.2.2.1 takes the leadership of the cluster and starts replication routines):

2017/03/28 10:34:21 [INFO] raft: Node at 10.2.1.1:8300 [Leader] entering Leader state
2017/03/28 10:34:21 [INFO] raft: Added peer 10.2.2.1:8300, starting replication
2017/03/28 10:34:21 [INFO] raft: Added peer 10.2.3.1:8300, starting replication
2017/03/28 10:34:21 [INFO] consul: cluster leadership acquired
2017/03/28 10:34:21 [INFO] consul: New leader elected: 10.2.1.0/24,172.28.5.71,c2bd8b49b4ae

3. Gracefully delete leader node from the Consul cluster:

[[email protected]]$ consul leave

Logs from the former Consul leader:

2017/03/28 10:39:37 [INFO] raft: Removed ourself, transitioning to follower
2017/03/28 10:39:37 [INFO] raft: Node at 10.2.1.1:8300 [Follower] entering Follower state (Leader: "")
2017/03/28 10:39:37 [INFO] consul: cluster leadership lost
...
2017/03/28 10:39:37 [INFO] agent: shutdown complete
2017/03/28 10:39:37 [DEBUG] http: Shutting down http server (0.0.0.0:8500)

4. Wait until a new leader is reelected:

2017/03/28 10:39:48 [INFO] raft: Election won. Tally: 2
2017/03/28 10:39:48 [INFO] raft: Node at 10.2.3.1:8300 [Leader] entering Leader state
2017/03/28 10:39:48 [INFO] raft: Added peer 10.2.2.1:8300, starting replication
2017/03/28 10:39:48 [INFO] consul: cluster leadership acquired
2017/03/28 10:39:48 [INFO] consul: New leader elected: 10.2.3.0/24,172.28.5.211,808195027ade
2017/03/28 10:39:48 [INFO] raft: pipelining replication to peer {Voter 10.2.2.1:8300 10.2.2.1:8300}

5. Join detached member to the Consul cluster back (with the same IP address and --bootstrap-expect set to 3):

2017/03/28 11:06:06 [INFO] serf: EventMemberJoin: 10.2.2.0/24,172.28.5.16,19883068ab3c 10.2.2.1
2017/03/28 11:06:06 [INFO] serf: EventMemberJoin: 10.2.3.0/24,172.28.5.211,808195027ade 10.2.3.1
2017/03/28 11:06:06 [INFO] consul: Adding LAN server 10.2.2.0/24,172.28.5.16,19883068ab3c (Addr: tcp/10.2.2.1:8300) (DC: dc1-3-santa_clara)
2017/03/28 11:06:06 [DEBUG] memberlist: Initiating push/pull sync with: 10.2.3.1:8301
2017/03/28 11:06:06 [INFO] agent: (LAN) joined: 3 Err: <nil>
2017/03/28 11:06:06 [INFO] agent: Join completed. Synced with 3 initial agents
2017/03/28 11:06:06 [INFO] consul: Existing Raft peers reported by 10.2.2.0/24,172.28.5.16,19883068ab3c, disabling bootstrap mode

6. Notice that due to the heart-beat timeout expiration caused by accumulation of uncommitted logs on rejoined member leader re-election never ends:

$ grep -i 'log not found' 10-2-1-1-consul-after.log
2017/03/28 11:06:06 [WARN] raft: Failed to get previous log: 247 log not found (last: 0)
2017/03/28 11:06:48 [WARN] raft: Failed to get previous log: 261 log not found (last: 260)
2017/03/28 11:07:48 [WARN] raft: Failed to get previous log: 269 log not found (last: 268)
2017/03/28 11:08:48 [WARN] raft: Failed to get previous log: 278 log not found (last: 277)
...
2017/03/28 11:14:07 [WARN] raft: Failed to get previous log: 331 log not found (last: 330)
2017/03/28 11:15:07 [WARN] raft: Failed to get previous log: 342 log not found (last: 341)
2017/03/28 11:16:07 [WARN] raft: Failed to get previous log: 353 log not found (last: 352)
...
2017/03/28 11:28:44 [WARN] raft: Failed to get previous log: 472 log not found (last: 465)
2017/03/28 11:28:44 [WARN] raft: Failed to get previous log: 476 log not found (last: 475)
$ grep -i heartbeat 10-2-1-1-consul-after.log
2017/03/28 11:11:59 [WARN] raft: Heartbeat timeout from "10.2.3.1:8300" reached, starting election
2017/03/28 11:13:08 [WARN] raft: Heartbeat timeout from "10.2.2.1:8300" reached, starting election
2017/03/28 11:19:15 [WARN] raft: Heartbeat timeout from "10.2.2.1:8300" reached, starting election
2017/03/28 11:21:27 [WARN] raft: Heartbeat timeout from "10.2.3.1:8300" reached, starting election
2017/03/28 11:23:32 [WARN] raft: Heartbeat timeout from "10.2.3.1:8300" reached, starting election
2017/03/28 11:27:41 [WARN] raft: Heartbeat timeout from "10.2.3.1:8300" reached, starting election

According to my limited understanding, only event of the logs commitment update the leader heart-beat timer on the followers node. An expired timer causes the leader re-election over and over.

Log Fragments

Logs from the server before the leave from the cluster 10-2-1-1-consul-before.log.
Logs from the server after the rejoin to the cluster: 10-2-1-1-consul-after.log.
Logs from another server node: 10-2-3-1-consul.log.

Note, I validated this scenario on 0.6.4 and 0.7.1 versions as well and always able to reproduce.

@ybubnov ybubnov changed the title Infinite reelection loop on leader node rejoin Infinite re-election loop on leader node rejoin Mar 28, 2017
@slackpad
Copy link
Contributor

slackpad commented May 1, 2017

Hi @ybubnov the log warnings are actually OK (the leader will send older logs or a snapshot). I think what's happening here as that the server that's coming back is replaying the logs and removing itself, getting stuck in a loop. If you don't do the leave it should work ok just to restart the server, but we will have to think about the right way to fix this.

@slackpad slackpad added the type/bug Feature does not function as expected label May 1, 2017
@slackpad slackpad added the theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization label May 25, 2017
@Mike-Dunton
Copy link

Mike-Dunton commented Sep 28, 2017

Is there any solution to this? Can't get the server that was reset to join back to the cluster.

@ChipV223
Copy link
Contributor

ChipV223 commented Apr 9, 2021

Hi all!

I just tried running the same repro steps that @ybubnov listed in the original response using Consul v1.9.4 and I was not able to get the same behavior. Currently when the original leader rejoins the cluster, it's able to do so successfully with no leader re-election loop.

I'll go ahead and close this for now, but do reach out if you are able to reproduce this using one of the later releases of Consul!

@ChipV223 ChipV223 closed this as completed Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

4 participants