Skip to content
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.

[stable/rabbitmq] Persistent storage inconsistency #7806

Closed
dene14 opened this issue Sep 19, 2018 · 3 comments · Fixed by #7807 or #12677
Closed

[stable/rabbitmq] Persistent storage inconsistency #7806

dene14 opened this issue Sep 19, 2018 · 3 comments · Fixed by #7807 or #12677

Comments

@dene14
Copy link
Contributor

dene14 commented Sep 19, 2018

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

Version of Helm and Kubernetes:
1.10

Which chart:
stable/rabbitmq

What happened:
mnesia storage directory changes every time pod gets new IP address. Same happens for logs. If you will check persistent storage for any cluster that runs long enough (months) you will notice a lot of log files and mnesia storage directories named with ex-IPs of the pod.
Also there are high chances to actually lose data in case of dare situation where all pods of rabbitmq will get terminated at once.

What you expected to happen:
Since we're using statefulset for rabbitmq persistent storage should be in line with that. So for naming we should use POD_NAME rather than POD_IP.

How to reproduce it (as minimally and precisely as possible):

  1. In case you don't have any additional vHost other than default one defined in helm configuration try to create additional vHost
  2. delete all rabbitmq pods at once
  3. Additional vHost from step Move templates from kubernetes/deployment-manager to the registry #1 will disappear when cluster is back online.

Anything else we need to know:
Chart uses IP based clustering and with current configuration of rabbitmq it potentially may lead to rabbitmq inconsistency or even data/schema loss in case all the pods will go down all together.
Also during a lifetime a lot of directories of mnesia storage created with IPs those were allocated to the pod in name. Also it's quite inconvenient to locate RMQ node by IP address

@azhi
Copy link

azhi commented Mar 13, 2019

Just got bit by this as well. Default mode really needs to be hostname-based.
If decision on changing default mode is stuck (it's been almost 5 months), maybe we should add a note to readme that in order to properly use persistence, you need to switch to hostname-based clusterting?

@dene14
Copy link
Contributor Author

dene14 commented Mar 13, 2019

I'm rather agree. But we're hesitant to change default to avoid breaks for existent installs.
Though I barely can imagine any existent setup in production that works fine for a long time.
People most probably affected by 2 scenarios:

  1. lost their data already (and switched to hostname-based clustering mode)
  2. experiencing excessive storage saturation (which normally should attract any ops person's attention)

Summon maintainers of the chart... @carrodher @desaintmartin @juan131 @prydonius @sameersbn @tompizmor

@tompizmor
Copy link
Collaborator

tompizmor commented Mar 29, 2019

HI @dene14 @azhi !

I also believe that the default clustering method should be hostname. As @dene14 said, right now the chart is not suitable to be run in production.

I will prepare a PR changing the clustering method. As it is a breaking change, I will update the major version of the chart and put a notice in the README. I will try to find if there is any workaround to upgrade to the new version without losing the current data.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants