[stable/rabbitmq] Persistent storage inconsistency #7806

dene14 · 2018-09-19T00:27:38Z

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

Version of Helm and Kubernetes:
1.10

Which chart:
stable/rabbitmq

What happened:
mnesia storage directory changes every time pod gets new IP address. Same happens for logs. If you will check persistent storage for any cluster that runs long enough (months) you will notice a lot of log files and mnesia storage directories named with ex-IPs of the pod.
Also there are high chances to actually lose data in case of dare situation where all pods of rabbitmq will get terminated at once.

What you expected to happen:
Since we're using statefulset for rabbitmq persistent storage should be in line with that. So for naming we should use POD_NAME rather than POD_IP.

How to reproduce it (as minimally and precisely as possible):

In case you don't have any additional vHost other than default one defined in helm configuration try to create additional vHost
delete all rabbitmq pods at once
Additional vHost from step Move templates from kubernetes/deployment-manager to the registry #1 will disappear when cluster is back online.

Anything else we need to know:
Chart uses IP based clustering and with current configuration of rabbitmq it potentially may lead to rabbitmq inconsistency or even data/schema loss in case all the pods will go down all together.
Also during a lifetime a lot of directories of mnesia storage created with IPs those were allocated to the pod in name. Also it's quite inconvenient to locate RMQ node by IP address

The text was updated successfully, but these errors were encountered:

azhi · 2019-03-13T18:19:42Z

Just got bit by this as well. Default mode really needs to be hostname-based.
If decision on changing default mode is stuck (it's been almost 5 months), maybe we should add a note to readme that in order to properly use persistence, you need to switch to hostname-based clusterting?

dene14 · 2019-03-13T23:44:58Z

I'm rather agree. But we're hesitant to change default to avoid breaks for existent installs.
Though I barely can imagine any existent setup in production that works fine for a long time.
People most probably affected by 2 scenarios:

lost their data already (and switched to hostname-based clustering mode)
experiencing excessive storage saturation (which normally should attract any ops person's attention)

Summon maintainers of the chart... @carrodher @desaintmartin @juan131 @prydonius @sameersbn @tompizmor

tompizmor · 2019-03-29T14:31:49Z

HI @dene14 @azhi !

I also believe that the default clustering method should be hostname. As @dene14 said, right now the chart is not suitable to be run in production.

I will prepare a PR changing the clustering method. As it is a breaking change, I will update the major version of the chart and put a notice in the README. I will try to find if there is any workaround to upgrade to the new version without losing the current data.

dene14 mentioned this issue Sep 19, 2018

[stable/rabbimq] Improve consistensy for persistent storage #7807

Merged

k8s-ci-robot closed this as completed in #7807 Oct 4, 2018

tompizmor mentioned this issue Mar 29, 2019

[stable/rabbitmq] Change default clustering method to hostname #12677

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stable/rabbitmq] Persistent storage inconsistency #7806

[stable/rabbitmq] Persistent storage inconsistency #7806

dene14 commented Sep 19, 2018

azhi commented Mar 13, 2019

dene14 commented Mar 13, 2019

tompizmor commented Mar 29, 2019 •

edited

Loading

[stable/rabbitmq] Persistent storage inconsistency #7806

[stable/rabbitmq] Persistent storage inconsistency #7806

Comments

dene14 commented Sep 19, 2018

azhi commented Mar 13, 2019

dene14 commented Mar 13, 2019

tompizmor commented Mar 29, 2019 • edited Loading

tompizmor commented Mar 29, 2019 •

edited

Loading