-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
forwardRPC
and globalRPC
do not take LAN ServerLookup
into account when the target datacenter is the local datacenter
#8403
Comments
This is the globalRPC function, it uses Lines 641 to 660 in d1c879e
We were unsure if #8406 fixes this issue. After this fix, single node dcs are healthy and a single server marks itself as healthy. That means that even a call through the router should be successful. @mkeeler suggested to
I agree that a server shouldn't RPC itself, but apart from this optimization, there is no bug anymore. |
So it turns out that the server does need to RPC itself. The first server that gets the keyring request will update the WAN keyring and then issue Do we need an RPC to do this: No. |
consul/agent/consul/rpc.go
Lines 604 to 669 in 4c8a15b
When performing a keyring listing we use the
globalRPC
function to issue the RPC against all datacenters and collect the results.globalRPC
in turn just callsforwardDC
on all known datacenters including the local DC.This doesn't sound too bad but when coupled with #8401 it can cause some issues. Mainly that in a single node cluster (or multi-node where the servers WAN gossip ports are firewalled from each other) the router being used by
forwardRPC
to lookup a server in the local DC will report that there are no reachable servers in the datacenter and fail to perform the RPC.We should either 1) exclude the local DC from the globalRPC call 2) ensure that
forwardRPC
uses theServerLookup
instead of theRouter
for finding a server to use in the local datacenter or 3) avoid all the server lookups for the local DC and just field the RPC locally.Right now I am leaning toward the second option as it seems most robust.
The text was updated successfully, but these errors were encountered: