Service interruption during leader election #21978

Peter2121 · 2024-11-28T11:22:31Z

Overview of the Issue

We have Consul v1.19.2 cluster of 5 servers in 2 datacenters connected by VPN. The overall stability of the cluster is good, but in case of loosing the connection between the datacenters, at the moment of restoring the connection, we temporary loose service discovery. The following record is present in consul log:

<133>1 2024-11-28T00:09:46.231877+01:00 consul6.cloud.local consul 12196 - - 2024-11-28T00:09:46.231+0100 [ERROR] agent.http: Request error: method=GET url="/v1/health/service/bderp?filter=%28not+%28Checks.Status%3D%3Dcritical%29+and+%28Checks.CheckID%21%3DserfHealth%29%29" from=10.192.8.140:38318 error="No cluster leader"

As I understand, the sequence is as follows:

The cluster works correctly, the leader is at the 'main' side (where we have 3 Consul servers installed);
The connection with 'secondary' datacenter is lost, the 2 servers from the 'secondary' datacenter are wiped from the configuration at the 'main' side, everything works correctly on the 'main' side;
The connection is restored, two servers from 'secondary' side reconnect to the cluster, the current leader stops his leadership, the election process is started;
At this moment a client tries to discover a service using "/v1/health/service/..." request to a server, this request is failed as the cluster has no leader;
Everything comes back after election of a new leader.

Maybe this behavior is 'by design' and we need to tweak our configuration to avoid failures in service discovery during leader election. Any advise is welcome.

Reproduction Steps

Install Consul cluster with at least 3 servers
Add at least one service
Cut network connection of one server
Restore network connection of disconnected server
Immediately run curl to get list of healthy services

Operating system and Environment details

Consul v1.19.2 on FreeBSD 14.0 x64

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Service interruption during leader election #21978

Service interruption during leader election #21978

Peter2121 commented Nov 28, 2024

Service interruption during leader election #21978

Service interruption during leader election #21978

Comments

Peter2121 commented Nov 28, 2024

Overview of the Issue

Reproduction Steps

Operating system and Environment details