Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service interruption during leader election #21978

Open
Peter2121 opened this issue Nov 28, 2024 · 0 comments
Open

Service interruption during leader election #21978

Peter2121 opened this issue Nov 28, 2024 · 0 comments

Comments

@Peter2121
Copy link

Overview of the Issue

We have Consul v1.19.2 cluster of 5 servers in 2 datacenters connected by VPN. The overall stability of the cluster is good, but in case of loosing the connection between the datacenters, at the moment of restoring the connection, we temporary loose service discovery. The following record is present in consul log:

<133>1 2024-11-28T00:09:46.231877+01:00 consul6.cloud.local consul 12196 - - 2024-11-28T00:09:46.231+0100 [ERROR] agent.http: Request error: method=GET url="/v1/health/service/bderp?filter=%28not+%28Checks.Status%3D%3Dcritical%29+and+%28Checks.CheckID%21%3DserfHealth%29%29" from=10.192.8.140:38318 error="No cluster leader"

As I understand, the sequence is as follows:

  • The cluster works correctly, the leader is at the 'main' side (where we have 3 Consul servers installed);
  • The connection with 'secondary' datacenter is lost, the 2 servers from the 'secondary' datacenter are wiped from the configuration at the 'main' side, everything works correctly on the 'main' side;
  • The connection is restored, two servers from 'secondary' side reconnect to the cluster, the current leader stops his leadership, the election process is started;
  • At this moment a client tries to discover a service using "/v1/health/service/..." request to a server, this request is failed as the cluster has no leader;
  • Everything comes back after election of a new leader.

Maybe this behavior is 'by design' and we need to tweak our configuration to avoid failures in service discovery during leader election. Any advise is welcome.

Reproduction Steps

  1. Install Consul cluster with at least 3 servers
  2. Add at least one service
  3. Cut network connection of one server
  4. Restore network connection of disconnected server
  5. Immediately run curl to get list of healthy services

Operating system and Environment details

Consul v1.19.2 on FreeBSD 14.0 x64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant