-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k8s peer discovery v2 #13050
base: main
Are you sure you want to change the base?
k8s peer discovery v2 #13050
Conversation
7aadc44
to
54eecc1
Compare
872519e
to
d153aaf
Compare
deps/rabbitmq_peer_discovery_k8s/src/rabbit_peer_discovery_k8s.erl
Outdated
Show resolved
Hide resolved
d153aaf
to
5a8865b
Compare
This came up when testing whether all nodes would wait for node 0 could not start. To test this, I was killing node 0 every few seconds, so it would attempt to start and then get killed. This led to other nodes sometimes discovering it and attempting to sync, but then crashing with function_clause as it went away.
Rather than querying the Kubernetes API, just check the local node name and try to connect to the pod with ID `0` (`-0` suffix). Only the pod with ID 0 can form a new cluster - all other pods will wait forever. This should prevent any race conditions and incorrectly formed clusters.
5a8865b
to
b8d3186
Compare
I think the general idea is good. It's simple and works with both parallel and sequential startup of nodes. I leave a couple of comments/thoughts/questions here:
|
I didn't know Kubernetes added configurability. I was thinking about adding some configurability just in case there were some crazy configurations where the defaults didn't work, but this makes such configurability a must (why would anyone set a custom ordinal start though? why?! ;) ) There's currently no fallback - it will keep trying to contact server-0 forever. I haven't tried if it's possible to use a manual |
In David’s example, node 4 could join node 1 after checking it is already clustered with node 0. |
Proposed Changes
Completely different implementation of a peer discovery mechanism for Kubernetes. Rather than querying the Kubernetes API, assume the deployment uses a StatefulSet, as it should and just check the local node name. Then:
-0
- form a new cluster-0
node; keep trying forever - only the-0
node can form a new clusterWhile it works differently internally, it's completely backwards compatible - existing configuration options are accepted but ignored. Cluster Operator can deploy cluster with these changes and everything works with no changes on the Operator side.
Benefits of this approach:
Drawbacks:
TODO: