k8s peer discovery v2 #13050

mkuratczyk · 2025-01-10T11:22:33Z

Proposed Changes

Completely different implementation of a peer discovery mechanism for Kubernetes. Rather than querying the Kubernetes API, assume the deployment uses a StatefulSet, as it should and just check the local node name. Then:

if the node name's suffix is -0 - form a new cluster
in any other case, try to join the -0 node; keep trying forever - only the -0 node can form a new cluster

While it works differently internally, it's completely backwards compatible - existing configuration options are accepted but ignored. Cluster Operator can deploy cluster with these changes and everything works with no changes on the Operator side.

Benefits of this approach:

much simpler implementation
under no circumstance should the cluster be formed incorrectly (see Introduce seed-node based peer discovery option #13043)

Drawbacks:

existing plugin works perfectly fine for a vast majority of users so there's a risk of introducing significant changes

TODO:

discuss this approach
consider adding some configurability just in case (allow overriding the node name / seed node name?)
test in different Kubernetes environments

deps/rabbit/src/rabbit_peer_discovery.erl

deps/rabbitmq_peer_discovery_k8s/README.md

deps/rabbitmq_peer_discovery_k8s/src/rabbit_peer_discovery_k8s.erl

This came up when testing whether all nodes would wait for node 0 could not start. To test this, I was killing node 0 every few seconds, so it would attempt to start and then get killed. This led to other nodes sometimes discovering it and attempting to sync, but then crashing with function_clause as it went away.

Rather than querying the Kubernetes API, just check the local node name and try to connect to the pod with ID `0` (`-0` suffix). Only the pod with ID 0 can form a new cluster - all other pods will wait forever. This should prevent any race conditions and incorrectly formed clusters.

ansd · 2025-01-23T15:39:58Z

I think the general idea is good. It's simple and works with both parallel and sequential startup of nodes.

I leave a couple of comments/thoughts/questions here:

Yet another benefit (not listed) above is that the nodes do not need permission to query the K8s Endpoints object (which simplifies things further in security restrictive environments).
The ordinal is hardcoded to 0 here. StatefulSets can be configured to use a different .spec.ordinals.start. Should this configurability be ignored or addressed in this PR?
I haven't fully studied the code path, but what happens if the seed node 0 is down? For initial cluster formation, it's fine to make the whole deployment fail, but what happens if 0 gets down in my 3-node prod cluster, I don't get it back up, I panic, and want to scale out the cluster to a 4th node. The 4th node won't be able to contact 0. Does the 4th node then contact node 1 or 2? Does the boot process completely fail? Does the boot process succeeds, but it's not clustered and I can run join_cluster manually?

mkuratczyk · 2025-01-23T15:49:13Z

I didn't know Kubernetes added configurability. I was thinking about adding some configurability just in case there were some crazy configurations where the defaults didn't work, but this makes such configurability a must (why would anyone set a custom ordinal start though? why?! ;) )

There's currently no fallback - it will keep trying to contact server-0 forever. I haven't tried if it's possible to use a manual join_cluster operation to resolve a cluster in such a state. Will check. Adding a fallback to another node would go against the whole idea of how this approach works, so I'd rather not do it. However, the aforementioned configurability could perhaps be used to make the 4th node contact a different one than -0.

dumbbell · 2025-01-23T17:11:24Z

In David’s example, node 4 could join node 1 after checking it is already clustered with node 0.

mkuratczyk force-pushed the k8s-peer-discovery-simplified branch from 7aadc44 to 54eecc1 Compare January 10, 2025 11:23

mergify bot added bazel make labels Jan 10, 2025

mkuratczyk mentioned this pull request Jan 10, 2025

Introduce seed-node based peer discovery option #13043

Open

mkuratczyk force-pushed the k8s-peer-discovery-simplified branch 2 times, most recently from 872519e to d153aaf Compare January 20, 2025 09:49

dumbbell requested changes Jan 21, 2025

View reviewed changes

mkuratczyk force-pushed the k8s-peer-discovery-simplified branch from d153aaf to 5a8865b Compare January 22, 2025 07:55

mkuratczyk added 3 commits January 22, 2025 10:23

Feedback

b8d3186

mkuratczyk force-pushed the k8s-peer-discovery-simplified branch from 5a8865b to b8d3186 Compare January 22, 2025 09:23

mkuratczyk marked this pull request as ready for review January 22, 2025 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k8s peer discovery v2 #13050

k8s peer discovery v2 #13050

mkuratczyk commented Jan 10, 2025 •

edited

Loading

ansd commented Jan 23, 2025

mkuratczyk commented Jan 23, 2025

dumbbell commented Jan 23, 2025

k8s peer discovery v2 #13050

Are you sure you want to change the base?

k8s peer discovery v2 #13050

Conversation

mkuratczyk commented Jan 10, 2025 • edited Loading

Proposed Changes

ansd commented Jan 23, 2025

mkuratczyk commented Jan 23, 2025

dumbbell commented Jan 23, 2025

mkuratczyk commented Jan 10, 2025 •

edited

Loading