-
Notifications
You must be signed in to change notification settings - Fork 872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new static_primary config option #2318
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @thedodd.
I agree, that the static_primary
feature is useful and there is a certain demand for it from the community - I've been asked many times about the possibility to run clusters with Patroni, but without automatic failover, because people often want to simplify configuration management.
Right now the feature is very limited, and supports only single-node clusters, but I don't see any problem that it could be extended to support clusters of arbitrary size:
- Node that thinks that is it configured as static_primary should continue running as primary (or promote) even if DCS is down/not accessible or updated_lock()/acquire_lock() calls have failed.
- Other nodes should always use the static_primary instead of whatever is stored in the leader key in DCS.
- We can also support a switchover via Patroni REST API/patronictl. It will simply change the static_primary field in the config.
- If the static_primary value is changed in DCS, the new node should not promote until the old one is shut down cleanly. We can promote though if the old one isn't accessible.
- We need to document that in case of a network spilt it is the responsibility of the user to do STONITH of the isolated old primary. Such a situation could happen for example with Etcd, where we don't use quorum reads - the old static_primary will continue to see the stale data.
features/static_primary.feature
Outdated
Given I start postgres0 as static primary | ||
Then postgres0 is a leader after 10 seconds | ||
And there is a non empty initialize key in DCS after 15 seconds | ||
When I issue a PATCH request to http://127.0.0.1:8008/config with {"ttl": 20, "loop_wait": 2, "synchronous_mode": true} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need to enable synchronous_mode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was mostly just a copy/paste of other test code. I will remove that bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible that removing that setting has actually caused these tests to start failing. Added it back, and testing again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, tests shouldn't fail because of that.
My point is that in a single-node cluster it doesn't make any sense to enable synchronous_mode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, however it only starts as a single-node cluster. We then add two additional replicas immediately after it starts, and their behavior after startup is important in this case.
EDIT: as it turns out, toggling this value does impact the success/failure of the test. When this value is off, the tests fail. I'll dig into the test failure again with that value removed to see if I can pin down exactly why that is happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still believe enabling synchronous_mode in a single-node cluster doesn't make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still believe enabling synchronous_mode in a single-node cluster doesn't make sense.
I agree, but the point I am trying to make is that it does not stay a single-node cluster. Subsequent scenarios add additional members to the cluster.
I will continue to experiment with this to see why toggling this value is causing the test to fail.
e77735b
to
07e950e
Compare
@CyberDem0n thanks for taking the time to review these changes! I appreciate it.
The points that you've enumerated are all solid points, and many of them I agree with. One difficulty that I've run into with this PR so far is that there are quite a few differing opinions on what this should look like, and what would and would not be safe to implement. For example, making any assumptions about cluster roles outside of what the DCS state indicates, seems to be unsafe; however, as you've stated above, as long as all nodes are configured with the correct topology, then things will be fine. Others have mentioned that this project's intent overall is to help folks avoid footguns, which is definitely a great standard, and thus have argued against doing things outside of the authority of the DCS. All in all, my point is just that there are multiple good ways to implement something like this for Patroni clusters with > 1 member, but they all seem to have this minimal implementation pattern in common which can be used as a building block for perhaps what we could call My hope is that:
Thoughts? |
07e950e
to
9fdaf1b
Compare
bfd62f7
to
4a10968
Compare
Pull Request Test Coverage Report for Build 2756616454
💛 - Coveralls |
features/steps/static_primary.py
Outdated
json.loads(context.dcs_ctl.query(name)) | ||
assert False, "found value under DCS key {} after {} seconds".format(name, time_limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh, where is the actual check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is the call to json.loads
. If it succeeds, then this indicates that we found a member (name
) and it was not supposed to be there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that the running Patroni continues to call touch_member() even if Postgres isn't running.
Therefore, the member key never expires.
The assert
raises the exception which is just eaten on the next line. Or in other words, we wasted time_limit
just for nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, you are correct. Not sure why I constructed this in this format. Will update.
features/static_primary.feature
Outdated
Given I start postgres0 as static primary | ||
Then postgres0 is a leader after 10 seconds | ||
And there is a non empty initialize key in DCS after 15 seconds | ||
When I issue a PATCH request to http://127.0.0.1:8008/config with {"ttl": 20, "loop_wait": 2, "synchronous_mode": true} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still believe enabling synchronous_mode in a single-node cluster doesn't make sense.
self.shutdown() | ||
return 'patroni cluster is configured with a static primary, \ | ||
and this node is not the primary, shutting down and \ | ||
refusing to start' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.shutdown()
will not cause Patroni to shut down, but I believe it should be.
That behavior could be achieved the same way as here: https://github.com/zalando/patroni/blob/d8d634125c547122af22646bd9988686e75adc99/patroni/ha.py#L1448-L1451
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment here. The intention is not to shutdown Patroni, only to ensure that Postgres itself is not running.
Scenario: check removing static primary config from dcs allows replica startup | ||
Given I issue a PATCH request to http://127.0.0.1:8008/config with {"static_primary": null} | ||
Then "sync" key in DCS has leader=postgres0 after 20 seconds | ||
And "members/postgres1" key in DCS has state=running after 10 seconds | ||
And "members/postgres2" key in DCS has state=running after 10 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since postgres1 and postgres2 are not supposed to be running this scenario will fail and should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scenarios does indeed succeed, because the Patroni instances are still running. It is only the Postgres processes which are not starting due to the static_primary config (which is intended behavior).
Once that static_primary config is set to null in the DCS, then the Postgres replicas are able to start.
I think this is a point of confusion based on one of your recent comments in ha.py. The replica Patroni instances are not intended to sys.exit. The idea is to just keep Postgres from starting, and the Patroni process stays alive and will continue to check the DCS values periodically based on its control loop.
This will allow for a more flexible (and still safe) operator experience. Basically, cluster operators do not need to synchronize the steps of removing static_primary config and adding of new replicas. The operations are safe to overlap (fewer footguns).
4a10968
to
66c69a4
Compare
cb83b44
to
5dd4b08
Compare
5dd4b08
to
f687ce4
Compare
features/steps/static_primary.py
Outdated
if found_value: | ||
assert False, "found value under DCS key {} after {} seconds".format(name, time_limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Member keys will not expire because they are updated by running Patroni
- In case of K8s it is even worse, because the info is stored in Pod annotations.
73519e0
to
4000673
Compare
@step('"{name}" key not in DCS after waiting {time_limit:d} seconds') | ||
def check_member_not_present(context, name, time_limit): | ||
sleep(time_limit) | ||
found_value = False | ||
try: | ||
res = json.loads(context.dcs_ctl.query(name)) | ||
if res is not None: | ||
found_value = True | ||
except Exception: | ||
pass | ||
|
||
if found_value: | ||
print("found value under DCS key {}: {}".format(name, res)) | ||
assert False, "found value under DCS key {} after {} seconds".format(name, time_limit) | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The member key will not expire if Patroni is running.
87d40bf
to
a8519b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest removing everything related to sync replication.
This is not what we test.
Then I receive a response code 200 | ||
When I start postgres1 with a configured static primary will not boot after 20 seconds | ||
And I start postgres2 with a configured static primary will not boot after 20 seconds | ||
And "sync" key not in DCS after waiting 20 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And "sync" key not in DCS after waiting 20 seconds |
|
||
Scenario: check removing static primary config from dcs allows replica startup | ||
Given I issue a PATCH request to http://127.0.0.1:8008/config with {"static_primary": null} | ||
Then "sync" key in DCS has leader=postgres0 after 20 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then "sync" key in DCS has leader=postgres0 after 20 seconds |
|
||
|
||
@step('"{name}" key not in DCS after waiting {time_limit:d} seconds') | ||
def check_member_not_present(context, name, time_limit): | ||
sleep(time_limit) | ||
found_value = False | ||
try: | ||
res = json.loads(context.dcs_ctl.query(name)) | ||
if res is not None: | ||
found_value = True | ||
except Exception: | ||
pass | ||
|
||
if found_value: | ||
print("found value under DCS key {}: {}".format(name, res)) | ||
assert False, "found value under DCS key {} after {} seconds".format(name, time_limit) | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@step('"{name}" key not in DCS after waiting {time_limit:d} seconds') | |
def check_member_not_present(context, name, time_limit): | |
sleep(time_limit) | |
found_value = False | |
try: | |
res = json.loads(context.dcs_ctl.query(name)) | |
if res is not None: | |
found_value = True | |
except Exception: | |
pass | |
if found_value: | |
print("found value under DCS key {}: {}".format(name, res)) | |
assert False, "found value under DCS key {} after {} seconds".format(name, time_limit) | |
return True |
|
||
@step('"{name}" is stopped and uninitialized after waiting {time_limit:d} seconds') | ||
def member_is_stopped_and_uninitialized(context, name, time_limit): | ||
sleep(time_limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is much better to do it in a loop, like here: https://github.com/zalando/patroni/blob/cea1fa869baafc077d17dd5eb1b60350d09d0592/features/steps/cascading_replication.py#L26-L41
In essence, this configuration option will ensure that a Patroni cluster running with a static primary will not demote the master unnecessarily. Transient failures to update the leader lock in the DCS will not cause a demotion when running with a static primary. When running as leader under normal circumstances, DCS exceptions will not cause a demotion when running with `static_primary=thisNode`. Even if replicas are added to the Patroni cluster, Patroni will be able to protect itself from entering into unsafe states by checking the value of static_primary. If the configured static_primary is not the host node, then the replica will refuse to progress to postmaster boot. If the configuration was dynamic, then the replica will shutdown when such a state is detected. Added behavioral tests for static primary feature.
a8519b0
to
15a583d
Compare
closing in favour of DCS failsafe mode feature |
In essence, this configuration option will ensure that a
Patroni cluster running with a static primary will not demote
the master unnecessarily.
Transient failures to update the leader lock in the DCS will not cause a
demotion when running with a static primary. When running as leader
under normal circumstances, DCS exceptions will not cause a demotion
when running with
static_primary=thisNode
.Even if replicas are added to the Patroni cluster, Patroni will
be able to protect itself from entering into unsafe states by checking
the value of static_primary. If the configured static_primary is not the
host node, then the replica will refuse to progress to postmaster boot.
If the configuration was dynamic, then the replica will shutdown when
such a state is detected.
Added behavioral tests for static primary feature.