-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: initialize mgw-wanfed to use local gateways more on startup #9528
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The atomic alignment thing seems worth a quick fix although it's in test code unless I'm mistaken there.
Overall I think I understand what's going on here and the tests look thorough. I'm not super confident I understand the intricacies of the WAN Fed process enough to be sure this is correct or complete overall but it appears to do something shaped like a solution to the issue you filed!
I think if you have specific concerns or uncertainties about it then I'd be happy to spend more time digging through and trying things out to give a more confident approval, but I trust your judgement if you have high confidence this is correct now!
@@ -93,17 +106,25 @@ func TestGatewayLocator(t *testing.T) { | |||
) | |||
g.SetUseReplicationSignal(isLeader) | |||
|
|||
t.Run("before first run", func(t *testing.T) { | |||
assert.True(t, g.DialPrimaryThroughLocalGateway()) // defaults to sure! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As someone not super familiar with this code. It's not very clear to me why this test case should expect to dial through gateways when it has no data?
Doesn't that mean it doesn't know where primary gateways are yet and so it should try servers direct?
If this is correct because we just opportunistically dial gateways and fail if there are none a comment here might help!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So DialPrimaryThroughLocalGateway
just keeps track of the simple failure detector logic and does not include information about federation state catalog data.
It primarily gets used this way:
if g.datacenter == g.primaryDatacenter {
addrs = g.primaryGateways
} else if g.DialPrimaryThroughLocalGateway() && len(g.localGateways) > 0 {
addrs = g.localGateways
} else {
// Note calling StringSliceMergeSorted only works because both
// inputs are pre-sorted. If for some reason one of the lists has
// *duplicates* (which shouldn't happen) it's not great but it
// won't break anything other than biasing our eventual random
// choice a little bit.
addrs = stringslice.MergeSorted(g.primaryGateways, g.PrimaryGatewayFallbackAddresses())
}
In this PR we explicitly initialize this failure detector to the success state on startup:
@@ -283,6 +289,8 @@ func NewGatewayLocator(
primaryGatewaysReadyCh: make(chan struct{}),
}
g.logPrimaryDialingMessage(g.DialPrimaryThroughLocalGateway())
+ // initialize
+ g.SetLastFederationStateReplicationError(nil, false)
return g
}
before actually making an observation. That's the main change. This means that the gateway locator will just assume that the local gateways are the better option UNTIL they fail, to encourage the system to stick with the local gateways even on startup. If there aren't actually any local gateways then the calling code will ignore this decision and do something else.
cb7e943
to
eb177ab
Compare
eb177ab
to
b7ef360
Compare
🍒 If backport labels were added before merging, cherry-picking will start automatically. To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/317129. |
🍒✅ Cherry pick of commit c608dc0 onto |
🍒✅ Cherry pick of commit c608dc0 onto |
Fixes #9342