Introduce cluster instance identity other than IP addresses #1835

mxinden · 2019-04-12T11:03:57Z

Scenario: Two Alertmanager clusters (A, B) are running in the same Kubernetes clusters.

If Alertmanager cluster A is scaled down by one instance with IP address X and within a small time range Alertmanager cluster B is scaled up by one instance with the recycled IP address X the two clusters will merge resulting in one big cluster.

We are hitting this problem in the Prometheus Operator end-to-end test suite (prometheus-operator/prometheus-operator#2544) on a single node (small CIDR space) Kubernetes cluster.

The probability of this happening in production systems is questionable. Most setups probably only include a single Alertmanager cluster, in addition, CIDR ranges might be a lot bigger and IP address recycling might not happen as frequently.

This could be prevented via a unique identifier per Alertmanager cluster, disallowing instances with different identifiers to join. In addition #1819 introducing mutual TLS could stop accidental cluster merges in case trust chains a scoped per cluster.

The purpose of this issue is to document the failure for the future and give anyone hitting the same issue a central place to discuss further precedence.

brancz · 2019-04-12T12:49:17Z

While TLS chain of trust could accidentally solve this, I don’t think this is the appropriate solution. As you proposed a separate mechanism sounds reasonable.

As for the probability, this is actually not all that low I recall Kubernetes IP recycling to have caused various problems across the board.

simonpasquier · 2023-09-26T09:33:10Z

Closed by #3354

simonpasquier added component/high availability kind/enhancement labels Apr 19, 2019

simonpasquier closed this as completed Sep 26, 2023

simonpasquier mentioned this issue Sep 26, 2023

fix: add --cluster.label to alertmanager prometheus-operator/prometheus-operator#5945

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce cluster instance identity other than IP addresses #1835

Introduce cluster instance identity other than IP addresses #1835

mxinden commented Apr 12, 2019

brancz commented Apr 12, 2019

simonpasquier commented Sep 26, 2023

Introduce cluster instance identity other than IP addresses #1835

Introduce cluster instance identity other than IP addresses #1835

Comments

mxinden commented Apr 12, 2019

brancz commented Apr 12, 2019

simonpasquier commented Sep 26, 2023