-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconcile (repair or expand) quorum queue membership periodically #8218
Reconcile (repair or expand) quorum queue membership periodically #8218
Conversation
If a branch isn't ready, please do not submit PRs without a title here. You can submit it in your own fork where everyone should be able to review it. |
@michaelklishin sorry created it for comments from Karl, and thought it was fine to create it in a DRAFT state. I will update the PR today though as requested by Karl. But yeah, the title could have been better even for a draft. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. Some smaller nitpicks and suggestions - time to add some tests! :)
Tests added! |
Hi Simon, I'm away this week but will take a look when I get back.
…On Wed, 31 May 2023 at 01:15, Simon ***@***.***> wrote:
@SimonUnge <https://github.com/SimonUnge> requested your review on: #8218
<#8218> Evaluate quorum
queue membership periodically.
—
Reply to this email directly, view it on GitHub
<#8218 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJAHFDFG6M4HMC57M4CRW3XIZ5QFANCNFSM6AAAAAAYEKUS2M>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
--
*Karl Nilsson*
|
381df01
to
41405a1
Compare
f2dbb36
to
8d80817
Compare
I wonder whether node maintenance should be taken into account such that no quorum queue replicas are changed if a node upgrade takes a long time. |
Good idea. Should 'all' nodes be checked - that is, the leader node, the current members, and the potential new nodes - or would it be enough to just check leader and potential new nodes? filter_out_drained_nodes_local_read/1 seems to be a good function to use. |
Attempt to check if nodes are in maintenance mode added - logic now is to check the leader node, and new potential members. |
f005c57
to
00c3fb7
Compare
deps/rabbit/src/rabbit.erl
Outdated
@@ -175,6 +175,12 @@ | |||
{requires, [rabbit_alarm, guid_generator]}, | |||
{enables, core_initialized}]}). | |||
|
|||
-rabbit_boot_step({rabbit_queue_member_eval, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we no just have the process as static configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to 'follow suit' from other supervised processes and creation of supervisors, so piggybacked on that code.
But no objections changing it to something else?
65629bd
to
f0e6de2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely needs naming work, both for user-facing and
internal (e.g. module and function) names.
It also has some overlap with the queue leader location (even though the algorithm that picks an initial leader does not have to be identical to those adding extra replicas). Hopefully this will be
organized into a similar system of pluggable modules before 3.13.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename this feature to "member reconciliation" or "member repair" because it's not obvious to me what "member evaluation" means. "Reconciliation" is a term used by Akka and "repair" by Riak, Cassandra.
Apparently there is no way for me to "clear" my earlier review with a "just comment", I have to explicitly approve or the PR will keep saying I've requested changes. Oh well. This is an "approval in anticipation" I guess :) |
I have resolved a conflict with #8790. |
I found a bug in the implementation. It could be specific to a node reset but resets happen. With this config file:
Given these steps:
Node rabbit-2 starts and looks like it has the queues but it doesn't:
TL;DR, rabbit-2 is running a member of a queue of the same name, but it's actually not clustered with the other two members |
Can I also ask for another name change: |
Looking into it. But, this config is wrong, this will not trigger the auto growth code though. |
@mkuratczyk I do see the same 'issue' on the main branch, i.e without the reconcile code. Or, rather, I see that issue that on the reset node, i.e
Since you reset the node, it 'should' be automatically 'cleaned up' by my code. But, that takes 10 sec to trigger the check after the node was stopped. I would think you normally run 'forget_cluster_node' when you remove a node like you did though, so this issue should not happen. Not sure what a good solution would be here though? @kjnilsson any ideas? |
Thanks, I checked on |
Yeah the +1 I guess happened as some queues did start to get the auto-clean triggered. But, since you did not have target-group-size specified, it did not grow back to 3. |
How about |
|
I am wondering if we should perhaps not (at least not by default) remove members automatically. It feels like removals are the most likely to cause unexpected issues. |
@kjnilsson Yes, I'll make it optional, off by default. Does not hurt with the flexibility. That said, what is the expected behavior for @mkuratczyk example above, when you do 'stop_app, reset, join_cluster, start_app' (regardless if you have this feature in or not). Is it basically a case of 'Doctor, it hurts when I do this' and the response is 'Then don't do that'? I.e, don't just reset, run forget_cluster node first? |
I've played with this a bit more. On on if I have the target size configured, this gets fixed through reconciliation I think the main conclusion is that we should probably make |
9886443
to
6b15c57
Compare
6b15c57
to
559a83d
Compare
@michaelklishin I think most name changes is done, auto remove is now optional (default off), and rebased into one commit. |
We still would like to address one (fairly degenerate) scenario described above. There are several options, some are not mutually exclusive. We will file a new issue for that. |
Yes, I'd assume the same scenario would happen if you did the following operations on 'node 2':
node 2 would then still have the queue, but the cluster version of the queue removed the queue from its membership (with the forget_cluster_node). auto-remove is now optional, default off. But, if we feel its unsafe, we could remove it all together, and readd it once the duplicated queue scenario is figured out? |
An important follow-up for environments with a lot of quorum queues: #10193 |
Proposed Changes
As a subset PR for issue #7209
Add a process that periodically evaluates queue node membership. Currently only Quorum Queues but the
idea is that this process would also evaluate Stream Queues.
The high level algorithm is
rabbit_nodes:list_members()
to find 'dead' members. This should be rare, but could happen if the forget_cluster_node command fails before it 'shrinks' a queue. If there are 'dead' members, the shrink command is executed for those dead membersrabbit_nodes:list_running()
. If the following criteria is met, the process will add one new node as a member to the quorum queueConfiguration and policies
Static configuration
Policy and argument
Tasks
Future work
Types of Changes
What types of changes does your code introduce to this project?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply.You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.
CONTRIBUTING.md
documentFurther Comments
The design was in collaboration with @kjnilsson.
Alternative solutions investigated and partly implemented, was to add functionality to the RA library, with a new callback that triggered once RA noticed a node joining or leaving the cluster, and asking the 'client' to evaluate a queues membership. This solution was 'per queue' where as this PR is per node. There was worries that RA did too much non RAFT tasks, so the idea to update RA was dropped.