-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change corosync and sbd parameters based on the profile environment detected #842
Change corosync and sbd parameters based on the profile environment detected #842
Conversation
ec5d707
to
9a97a4e
Compare
2f678d7
to
4978e21
Compare
4978e21
to
51e6486
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please re-split the commits: one to accomodate Azure parameters, another one to deal with TimeoutStartUSec alone. Their relationship is not strong enough to mix them together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liangxin1300 ,
I have added a pair of comment I think you should check.
Besides them, some concerns:
- I don't see any new UT here, and the change looks enough sensitive to be good to have them
- I don't personally like this option to hardcode some provider specific values. If for some reason they change, you will need to change the code and package a new version... And more, knowing that we already have projects that do the same (like the
habootstrap-formula
). Personally, I don't like this way. In the other hand, if you have decided to go this way, it is fine, I'm just talking as a more outsider point of view
Yeah, I will add UT codes for this, after most of logic archived agreement between us:)
The issue I concerned is, where can we get the data/values, from only one place, that could be shared with all HA stack projects? I remember we had discussions in confluence page before, Dario suggest we make a specific package which contains all profiles data, and then all projects could use that |
Sounds like, some improvement is possible indeed. How about move the profile dictionary and also the matching string (eg. "microsoft-azure", "google-cloud-platform", "amazon-web-services") into crm.conf. Hence, it can be flexible enough for any one of these CSPs to further customize those values by users, if they don't like the default. |
520948f
to
0ccf3a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to tweak the commit header
from
"Fix: sbd: adjust sbd parameters when in Azure(bsc#1175896)"
to
"Fix: sbd: adjust sbd systemd TimeoutStartSec together with SBD_DELAY_START"
It is the genric logic more than just Azure(bsc#1175896).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to tweak the commit header from
"Fix: bootstrap: bootstrap parameters better cope with Azure for corosync(bsc#1175896)"
to something like,
"Fix: bootstrap: corosync and sbd parameters better cope with the environment profile"
Since the code is generic to both CSP and on-premise.
Aha, then, you brought up three folds in my view. First of all, don't use Azure's default sbd parameters for others. Secondly, I think you touch the part I call it "user_customized" in profiles.yaml, in particular, those ones not in CSP. Thirdly, there are several hardcoded defaults values in the code currently. That's fine. They are the last resort if no "profiles.yaml" exisits. However, I feel it worth to bring them up front and write them into "user_customized" explicitly. These will slightly improve the user experience than the black magic only known by reading the code. Well, this is not a high priority improvment, I feel. |
79cd89c
to
6d2ddfa
Compare
Changed as
|
e520b17
to
3bcc011
Compare
c870994
to
bbfefa0
Compare
@zzhou1 @gao-yan
if context.is_s390:
self._sbd_watchdog_timeout = self.SBD_WATCHDOG_TIMEOUT_DEFAULT_S390
self._stonith_watchdog_timeout = self.STONITH_WATCHDOG_TIMEOUT_DEFAULT_S390
else:
self._sbd_watchdog_timeout = self.SBD_WATCHDOG_TIMEOUT_DEFAULT
self._stonith_watchdog_timeout = self.STONITH_WATCHDOG_TIMEOUT_DEFAULT
sbd_msgwait_default = int(self._sbd_watchdog_timeout) * 2
sbd_msgwait = sbd_msgwait_default
if "sbd.msgwait" in self._context.profiles_dict:
sbd_msgwait = self._context.profiles_dict["sbd.msgwait"]
if int(sbd_msgwait) < sbd_msgwait_default:
bootstrap.warn("sbd msgwait is set to {}, it was {}".format(sbd_msgwait_default, sbd_msgwait))
sbd_msgwait = sbd_msgwait_default
def _correct_sbd_watchdog_timeout_for_s390(self):
"""
Corrent watchdog timeout if less than s390 default
"""
if self._context.is_s390 and self._sbd_watchdog_timeout < self.SBD_WATCHDOG_TIMEOUT_DEFAULT_S390:
bootstrap.warn("sbd_watchdog_timeout is set to {} for s390, it was {}".format(self.SBD_WATCHDOG_TIMEOUT_DEFAULT_S390, self._sbd_watchdog_timeout))
self._sbd_watchdog_timeout = self.SBD_WATCHDOG_TIMEOUT_DEFAULT_S390 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress, though still some changes needed.
06b56de
to
81f7ba1
Compare
07c3fc5
to
eddd7a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Now, it covers diskless-sbd and qdevice too !
9318b9c
to
dd438e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much neater. Thanks for the awesome work, @liangxin1300 !
Only a couple of nitpickings here.
self._sbd_watchdog_timeout = self.SBD_WATCHDOG_TIMEOUT_DEFAULT_WITH_QDEVICE | ||
if self._sbd_watchdog_timeout < self.SBD_WATCHDOG_TIMEOUT_DEFAULT_WITH_QDEVICE: | ||
bootstrap.warn("sbd_watchdog_timeout is set to {} for qdevice, it was {}".format(self.SBD_WATCHDOG_TIMEOUT_DEFAULT_WITH_QDEVICE, self._sbd_watchdog_timeout)) | ||
self._sbd_watchdog_timeout = self.SBD_WATCHDOG_TIMEOUT_DEFAULT_WITH_QDEVICE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurred to me, in case that corosync.quorum.device.timeout
could be configured in profiles, of course it should check against that rather than the default. But no urgency. Feel free to do the improvement now or some time in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggest value for this timeout on different platform? If not clear yet, we could add it in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's similar to the logic in above for the case where sbd is being added with existing qdevice. It's just that now qdevice sync_timeout is according to any configured in profile rather than in corosync.conf.
crmsh/sbd.py
Outdated
Check if SBD_DELAY_START is yes | ||
""" | ||
res = SBDManager.get_sbd_value_from_config("SBD_DELAY_START") | ||
return True if res and res == "yes" else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically sbd determines this based on crm_is_true()
function and whether it's explicitly set with a delay value:
https://github.com/ClusterLabs/sbd/blob/master/src/sbd-inquisitor.c#L989-L999
https://github.com/ClusterLabs/pacemaker/blob/master/lib/common/strings.c#L414-L444
Feel free to decide whether we need to make it cover all the cases of True
here :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as using return utils.is_boolean_true(res)
Thanks!
dd438e1
to
5421f95
Compare
profiles.yaml contains corosync and sbd parameters for different platform
…START * Make sure sbd has suitable msgwait and watchdog timeout when initializing * Enable SBD_DELAY_START for both diskless and disk based cases when virtual environment detected * Adjust start timeout for sbd when set SBD_DELAY_START * Refactors and more useful warnings
…rofile environment detected (bsc#1175896)
5421f95
to
a71081c
Compare
Motivation
In different environment, some corosync and sbd parameters should be adjusted.
It should be nice when bootstrapping if crmsh could detect the environment, then load the different profile data for different platforms.
Changes
/etc/crm/profiles.yml
for different platformsWhen starting pacemaker, show the time should wait for sbd(sbd.watchdog_timeout * 2 * 1.2)