Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul Service meta wrongly computes and exposes non_voter meta #8731

Merged

Conversation

pierresouchay
Copy link
Contributor

In Serf Tags, entreprise members being non-voters use the tag
nonvoter=1, not non_voter = false, so non-voters in members
were wrongly displayed as voter.

Demonstration:

consul members -detailed|grep voter
consul20-hk5 10.200.100.110:8301   alive   acls=1,build=1.8.4+ent,dc=hk5,expect=3,ft_fs=1,ft_ns=1,id=xxxxxxxx-5629-08f2-3a79-10a1ab3849d5,nonvoter=1,port=8300,raft_vsn=3,role=consul,segment=<all>,use_tls=1,vsn=2,vsn_max=3,vsn_min=2,wan_join_port=8302

In Serf Tags, entreprise members being non-voters use the tag
`nonvoter=1`, not `non_voter = false`, so non-voters in members
were wrongly displayed as voter.

Demonstration:

```
consul members -detailed|grep voter
consul20-hk5 10.200.100.110:8301   alive   acls=1,build=1.8.4+ent,dc=hk5,expect=3,ft_fs=1,ft_ns=1,id=xxxxxxxx-5629-08f2-3a79-10a1ab3849d5,nonvoter=1,port=8300,raft_vsn=3,role=consul,segment=<all>,use_tls=1,vsn=2,vsn_max=3,vsn_min=2,wan_join_port=8302
```
@mkeeler
Copy link
Member

mkeeler commented Sep 29, 2020

@pierresouchay IIUC the bug is that the node registration performed by the leader on behalf of a newly joined server does not correctly indicate the non voting status.

One issue I see is that when that new node eventually gets around to making a Catalog.Register RPC call to register its own node, the metadata pushed up there will not contain any of the values injected by the leader.

So I am wondering if we should do something to inject the node metadata for server agents in the agent local state as well?

@pierresouchay
Copy link
Contributor Author

@mkeeler This registration is done only by the leader if I am not mistaken: this is the only service in Consul doing that, so no-anti-entropy stuff there if I understand correcly

@mkeeler
Copy link
Member

mkeeler commented Sep 29, 2020

The registration done by the leader is just the first registration. Later on during anti-entropy this function may be called or we may update the node info while syncing services or while syncing checks.

This is why when you query for the consul service via the HTTP API the NodeMeta field of the results you get back are missing all the Raft protocol information as well as the non voter status.

@mkeeler
Copy link
Member

mkeeler commented Sep 29, 2020

This PR is definitely correct behavior as far as the leader routines are concerned. I was just thinking that although correct these meta values are going to be blown away anyways as soon as the agent's anti-entropy kicks in so its probably not the fix you were hoping for.

@pierresouchay
Copy link
Contributor Author

pierresouchay commented Oct 6, 2020

Hello @mkeeler,

This call is to me called by the leader when the status of any agent is changing from anything to Alive. Thus, any left > Alive or failed > alive would trigger it.

To me, for the "consul" service, only the leader does publish it, the other servers do not (otherwise the meta would not be published as the version on raft* protocol info would be emptied at each anti-entropy sync. Which is not what we see at all, we see consul service as stable and never overwritten by other consul servers => so I did not check in dept, but I suspect only the leader does publish this list and followers (or non-voters), do not publish themselves the fact they publish the "consul" service. There is even a specific case for de-registration of service "consul" in the same file when member is left/reaped: https://github.com/hashicorp/consul/blob/master/agent/consul/leader.go#L1365

The only case where it might failed would be if consul servers do update their non-voting flag without restarting (aka on a reload). I think this case does not exists (at least in my tests with 1.8.4) for now as the non-voting flag in Serf requires a leave + join now (There is an issue regarding this #6979 and in your internal ticketing system Internal-#34373. (Edit: it is in none of those issues, but still, that's another problem we have sometimes)

But yes, I see you point that in the future, this value might not be properly into account if reload of non_voter configuration would be updated dynamically (for instance with consul agent reload), so to me, we should either remove this attribute (but I would argue it would help a lot during operations in UIs) or maybe use this patch to have an accurate value (at least until the non_voter flag is not updated dynamically).

@mkeeler
Copy link
Member

mkeeler commented Oct 9, 2020

@pierresouchay Sorry for the rabbit trail here. There is node-meta for the consul agent and service meta for the consul service. I see now that this change was specifically for updating the service meta for the consul service and the node meta sync done during anti-entropy does not change the service meta.

@@ -0,0 +1,3 @@
```release-note:bug
entreprise: properly update consul server meta non_voter for non-voting Entreprise Consul servers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
entreprise: properly update consul server meta non_voter for non-voting Entreprise Consul servers
raft: (Enterprise only) properly update consul server meta non_voter for non-voting Enterprise Consul servers

Copy link
Member

@mkeeler mkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the changelog entry is okay I can approve/merge this.

@pierresouchay
Copy link
Contributor Author

@mkeeler Done, thank you!

@pierresouchay pierresouchay force-pushed the non_voter_tag_not_properly_computed branch from 5a185e1 to 244da70 Compare October 9, 2020 20:18
@5325833510
Copy link

Fatma

2 similar comments
@5325833510
Copy link

Fatma

@5325833510
Copy link

Fatma

@mkeeler mkeeler merged commit 9b7ed75 into hashicorp:master Oct 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants