-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added PeerManagement.Type = 'member-list' #492
Conversation
fa241b9
to
f6b266c
Compare
add92a1
to
9bd09b2
Compare
9bd09b2
to
de757d5
Compare
I haven't reviewed this yet, but the preferred fix to the race condition problem would be the Start() method approach. In fact, to keep this more manageable, any chance you could make that fix and the change to the PeerManagement API as a separate PR from adding the memberlist support? |
Remove `computePublicAddr()`. Use configuration's bind address verbatim.
I've found an unexpected behavior in the peering functionality that pertains to both Redis and this new Memberlist modes. When in Redis mode, it will bind a peer listener on the public IP, overriding whatever was set for a bind IP in config For memberlist mode, I removed the public IP logic to fix unit tests because they wanted to bind to localhost, not public IP. But this creates a disparity with Redis mode and I don't think it's my place to change the behavior of pre-existing logic. My recommendation is to remove the configuration override that binds to public IP only. But, you may also provide a cc: @kentquirk |
@thrawn01, I'm having some trouble getting this to work. There's some bugs in the peer list update once memberlist discovers a peer. But the real problem is that in Nomad, we can't effectively point it to other peers to prime the memberlist mesh. The consul service alias doesn't get created until after the service is up and running and health check passes. But, it's failing on startup because it can't find the alias and the health check fails. I'm going to give etcd a try in a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requires `PeerListenAddr` to be unique in the cluster. So, "0.0.0.0:8081" default value will not work.
d575ea9
to
75f080b
Compare
127.0.0.1 can be bound, but not any other 4th octet value. Would be better if custom ports on common IP worked. Peering currently requires unique IP addresses for each peer, regardless of port.
After discussion with @MikeGoldsmith, we decided that Memberlist probably isn't a good solution for our use case. It seems to work well when you have a dedicated cluster master with a static host that you can point peers to. But that isn't our environment. Under our Nomad environment all nodes are dynamic and may come and go with every new deployment. Instead, I feel it would be best to focus on integrating using our existing etcd service as we do with our other services. Additionally, there are two behaviors that don't impact production, but prevent end-to-end integration testing. First, the peer listener binds to the detected public IP, overriding the IP set in Second, peering logic identifies peers by IP address only, without the port. As a result, peering would not support multiple peers on same host but different ports. This design requires each peer to have different IPs and bind on the same port. As before, this also impacts testability. We agreed to leave these two behaviors unchanged in our peering changes. They may be addressed in later efforts. |
I'm going to close this for now. If someone wants to continue working on this, with mind of the points @Baliedge highlighted above, please re-open :) |
Which problem is this PR solving?
This PR removes the need to run Redis in the deployment environment. All
member-list
requires is a DNS entrypointing to a single active instance of Refinery to discover and monitor the health of the rest of the cluster. See https://github.com/hashicorp/memberlist
This allows customers who are using nomad, k8s or consul to deploy Refinery without the need for Redis.
Short description of the changes
MemberListListenAddr
,MemberListKnownMembers
andMemberListAdvertiseAddr
Close(ctx context.Context) error
to thePeers
interface to allow for clean shutdown and de-registration of an instance in themember-list
cluster.Race condition in
Peers
interface designSince
peer.NewPeers()
launches async go routines for bothredis
andmember-list
. it is possible that a callback could occur before we callRegisterUpdatedPeersCallback()
to register the call back, therefore missing the initial callback.For example, in
memberlist_test.go
we should be able to assert that p0 will receive 3 callbacks, one for itself, and then 2 for each additional member. However, because of the race condition, we might only get 2 callbacks.We could change the
Peers
interface to instead register a callback duringpeer.NewPeers(conf, cb)
call OR we could add aStart()
method to thePeers
interface, such that users wouldRegisterUpdatedPeersCallback()
and then callStart()
when ready to participate in receiving updates.