From a104c247b5afb88213fd75ebf3ff3cb5dad8f61e Mon Sep 17 00:00:00 2001 From: Teddy Andrieux Date: Wed, 13 Jan 2021 18:14:57 +0100 Subject: [PATCH 1/6] salt: Set salt-master timeout to 10 seconds By default salt master timeout is set to 5 seconds, and time to time it's not sufficient, as pillar compute may take some time and also it happens that some time a salt-minion take a bit of time to answer a job listing when executing a salt states (cherry picked from commit 2caba9efe69b2350491253c31557c1b3bc73f98a) --- salt/metalk8s/salt/master/files/master-99-metalk8s.conf.j2 | 2 ++ 1 file changed, 2 insertions(+) diff --git a/salt/metalk8s/salt/master/files/master-99-metalk8s.conf.j2 b/salt/metalk8s/salt/master/files/master-99-metalk8s.conf.j2 index 7198e988aa..c9b62b810e 100644 --- a/salt/metalk8s/salt/master/files/master-99-metalk8s.conf.j2 +++ b/salt/metalk8s/salt/master/files/master-99-metalk8s.conf.j2 @@ -2,6 +2,8 @@ interface: {{ salt_ip }} log_level: {{ 'debug' if debug else 'info' }} +timeout: 10 + peer: .*: - x509.sign_remote_certificate From a5a6b522cef3f4aa950be6bd0072dbec70ff00e8 Mon Sep 17 00:00:00 2001 From: Teddy Andrieux Date: Tue, 9 Feb 2021 18:34:57 +0100 Subject: [PATCH 2/6] salt: Increase salt-master default timeout to 20 Time to time, especially on really slow platform, we got failure because salt state execution timeout. Increase salt-master default timeout to 20 (cherry picked from commit 73835bef88a38db53330cb825c0d69e543386603) (cherry picked from commit d6c226ff1698ff04ade483cc08bc31f7482e1c64) --- salt/metalk8s/salt/master/files/master-99-metalk8s.conf.j2 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/salt/metalk8s/salt/master/files/master-99-metalk8s.conf.j2 b/salt/metalk8s/salt/master/files/master-99-metalk8s.conf.j2 index c9b62b810e..fff8efda7a 100644 --- a/salt/metalk8s/salt/master/files/master-99-metalk8s.conf.j2 +++ b/salt/metalk8s/salt/master/files/master-99-metalk8s.conf.j2 @@ -2,7 +2,7 @@ interface: {{ salt_ip }} log_level: {{ 'debug' if debug else 'info' }} -timeout: 10 +timeout: 20 peer: .*: From 7a8d005cf4382d4ae503f50c69265c4ccc3ca2b9 Mon Sep 17 00:00:00 2001 From: Teddy Andrieux Date: Mon, 15 Feb 2021 09:11:42 +0100 Subject: [PATCH 3/6] salt: Increase Salt master `sock_pool_size` and `worker_threads` Time to time salt-master get overloaded because he receive to much query, for example during upgrade and one environment a bit slow some salt states may timeout and make the upgrade fail. To avoid that kind of issue just bump the `sock_pool_size` on salt master (from 1 to 15) to avoid blocking waiting for zeromq communications and also bump the `worker_threads` on salt master (from 5 to 10) to avoid some failure if you have too many communication with the salt master (e.g.: because of upgrade + storage operator) Sees: https://github.com/saltstack/salt/issues/53147 (cherry picked from commit 21d679afabf2211574ecfdb3ee035f1c1040ddca) --- salt/metalk8s/salt/master/files/master-99-metalk8s.conf.j2 | 2 ++ 1 file changed, 2 insertions(+) diff --git a/salt/metalk8s/salt/master/files/master-99-metalk8s.conf.j2 b/salt/metalk8s/salt/master/files/master-99-metalk8s.conf.j2 index fff8efda7a..1d08a91c5e 100644 --- a/salt/metalk8s/salt/master/files/master-99-metalk8s.conf.j2 +++ b/salt/metalk8s/salt/master/files/master-99-metalk8s.conf.j2 @@ -3,6 +3,8 @@ interface: {{ salt_ip }} log_level: {{ 'debug' if debug else 'info' }} timeout: 20 +sock_pool_size: 15 +worker_threads: 10 peer: .*: From db42501623026299e16bdb6e0598b1d165251a09 Mon Sep 17 00:00:00 2001 From: Teddy Andrieux Date: Wed, 7 Apr 2021 11:16:55 +0200 Subject: [PATCH 4/6] salt: Remove all logic to ensure salt-minion ready from Salt state As suggested by Salt to restart Salt-minion process we should just `service.restart` in Background and then wait for the Salt-minion to be ready, in our case as part of the orchestrate Sees: https://docs.saltproject.io/en/latest/faq.html#restart-using-states Fixes: #3247 --- CHANGELOG.md | 5 +++++ buildchain/buildchain/salt_tree.py | 2 +- salt/metalk8s/orchestrate/deploy_node.sls | 8 +++++++- salt/metalk8s/salt/minion/configured.sls | 2 +- salt/metalk8s/salt/minion/init.sls | 5 ++--- salt/metalk8s/salt/minion/installed.sls | 10 +++++++-- salt/metalk8s/salt/minion/restart.sls | 4 ++++ salt/metalk8s/salt/minion/running.sls | 25 ----------------------- 8 files changed, 28 insertions(+), 33 deletions(-) create mode 100644 salt/metalk8s/salt/minion/restart.sls delete mode 100644 salt/metalk8s/salt/minion/running.sls diff --git a/CHANGELOG.md b/CHANGELOG.md index b49e749c2f..62d39d29fc 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,11 @@ - Add bootstrap and solutions configuration files (PR [#3222](https://github.com/scality/metalk8s/pull/3222)) +### Bug fixes +- [#3247](https://github.com/scality/metalk8s/issues/3247) - Fix a bug where + Salt minion process may fail to restart during upgrade or downgrade process + (PR [#3281](https://github.com/scality/metalk8s/pull/3281)) + ## Release 2.5.2 ### Enhancements diff --git a/buildchain/buildchain/salt_tree.py b/buildchain/buildchain/salt_tree.py index 2882a27079..e961f8e529 100644 --- a/buildchain/buildchain/salt_tree.py +++ b/buildchain/buildchain/salt_tree.py @@ -561,7 +561,7 @@ def _get_parts(self) -> Iterator[str]: Path('salt/metalk8s/salt/minion/init.sls'), Path('salt/metalk8s/salt/minion/installed.sls'), Path('salt/metalk8s/salt/minion/local.sls'), - Path('salt/metalk8s/salt/minion/running.sls'), + Path('salt/metalk8s/salt/minion/restart.sls'), Path('salt/metalk8s/solutions/available.sls'), Path('salt/metalk8s/solutions/init.sls'), diff --git a/salt/metalk8s/orchestrate/deploy_node.sls b/salt/metalk8s/orchestrate/deploy_node.sls index 9fae46a319..0d7ed7cbae 100644 --- a/salt/metalk8s/orchestrate/deploy_node.sls +++ b/salt/metalk8s/orchestrate/deploy_node.sls @@ -109,11 +109,17 @@ Reconfigure salt-minion: - salt: Check pillar before salt-minion configuration Wait minion available: + test.configurable_test_state: + - changes: False + - result: __slot__:salt:test.sleep(10) + - comment: Wait a bit for 'salt-minion' to restart before checking status + - onchanges: + - salt: Reconfigure salt-minion salt.runner: - name: metalk8s_saltutil.wait_minions - tgt: {{ node_name }} - require: - - salt: Reconfigure salt-minion + - test: Wait minion available - require_in: - http: Wait for API server to be available before highstate diff --git a/salt/metalk8s/salt/minion/configured.sls b/salt/metalk8s/salt/minion/configured.sls index ccb8962e30..5ca981f9a1 100644 --- a/salt/metalk8s/salt/minion/configured.sls +++ b/salt/metalk8s/salt/minion/configured.sls @@ -2,7 +2,7 @@ include: - .installed - - .running + - .restart Configure salt minion: file.managed: diff --git a/salt/metalk8s/salt/minion/init.sls b/salt/metalk8s/salt/minion/init.sls index 6f55027c7f..004bd68995 100644 --- a/salt/metalk8s/salt/minion/init.sls +++ b/salt/metalk8s/salt/minion/init.sls @@ -8,9 +8,8 @@ # Available states # ================ # -# * running -> Ensure salt minion running +# * restart -> Restart salt Minion if required `watch_in` set # include: - - .installed - - .running + - .configured diff --git a/salt/metalk8s/salt/minion/installed.sls b/salt/metalk8s/salt/minion/installed.sls index ea5e84a892..edda78ea86 100644 --- a/salt/metalk8s/salt/minion/installed.sls +++ b/salt/metalk8s/salt/minion/installed.sls @@ -2,7 +2,7 @@ include : - metalk8s.repo - - .running + - .restart Install salt-minion: {{ pkg_installed('salt-minion') }} @@ -11,5 +11,11 @@ Install salt-minion: - order: last - require: - test: Repositories configured - - require_in: + - watch_in: - cmd: Restart salt-minion + +Enable Salt minion: + service.enabled: + - name: salt-minion + - require: + - metalk8s_package_manager: Install salt-minion diff --git a/salt/metalk8s/salt/minion/restart.sls b/salt/metalk8s/salt/minion/restart.sls new file mode 100644 index 0000000000..fd185d3336 --- /dev/null +++ b/salt/metalk8s/salt/minion/restart.sls @@ -0,0 +1,4 @@ +Restart salt-minion: + cmd.wait: + - name: 'salt-call --local service.restart salt-minion > /dev/null' + - bg: True diff --git a/salt/metalk8s/salt/minion/running.sls b/salt/metalk8s/salt/minion/running.sls deleted file mode 100644 index fe9c86ce5c..0000000000 --- a/salt/metalk8s/salt/minion/running.sls +++ /dev/null @@ -1,25 +0,0 @@ -Restart salt-minion: - cmd.wait: - - name: 'salt-call --local service.restart salt-minion > /dev/null' - - bg: true - -Wait until salt-minion restarted: - test.configurable_test_state: - - changes: False - - result: __slot__:salt:test.sleep(10) - - comment: Wait a bit for 'salt-minion' to restart - - onchanges: - - cmd: Restart salt-minion - -Ensure salt-minion running: - service.running: - - name: salt-minion - - enable: True - - require: - - test: Wait until salt-minion restarted - test.configurable_test_state: - - changes: False - - result: __slot__:salt:test.ping() - - comment: Ran 'test.ping' - - require: - - service: Ensure salt-minion running From 408cd529c2715abcfed9725039961953e0459f8c Mon Sep 17 00:00:00 2001 From: Teddy Andrieux Date: Thu, 8 Apr 2021 08:49:35 +0200 Subject: [PATCH 5/6] salt: Rewrite `init.sls` for Salt minion Add some explanation about all state in `init.sls` of Salt minion states --- salt/metalk8s/salt/minion/init.sls | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/salt/metalk8s/salt/minion/init.sls b/salt/metalk8s/salt/minion/init.sls index 004bd68995..2817c25099 100644 --- a/salt/metalk8s/salt/minion/init.sls +++ b/salt/metalk8s/salt/minion/init.sls @@ -8,7 +8,11 @@ # Available states # ================ # -# * restart -> Restart salt Minion if required `watch_in` set +# * local -> Configure salt minion so that it can run in local mode +# (need to have access on the ISO, so only run it on the bootstrap) +# * installed -> Install/Upgrade and enable salt minion +# * configured -> Configure salt minion process +# * restart -> Restart salt Minion if required `watch_in` set # include: From 14321d2e36081e09f5384992ec14eb8a7411fb7c Mon Sep 17 00:00:00 2001 From: Teddy Andrieux Date: Fri, 9 Apr 2021 11:49:19 +0200 Subject: [PATCH 6/6] salt: "Reconfigure salt-minion" even on first node install Before this commit we were only calling the "Reconfigure salt-minion" state if we are not on first node deployment, but state may do some changes that require a Salt minion process to restart. For example the salt package get hold only when you call the salt minion state when salt already installed --- salt/metalk8s/orchestrate/deploy_node.sls | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/salt/metalk8s/orchestrate/deploy_node.sls b/salt/metalk8s/orchestrate/deploy_node.sls index 0d7ed7cbae..5b91248210 100644 --- a/salt/metalk8s/orchestrate/deploy_node.sls +++ b/salt/metalk8s/orchestrate/deploy_node.sls @@ -23,7 +23,7 @@ Accept key: - require: - salt: Deploy salt-minion on a new node -Wait minion available: +Wait minion available ssh: salt.runner: - name: metalk8s_saltutil.wait_minions - tgt: {{ node_name }} @@ -73,8 +73,6 @@ Sync module on the node: - kwarg: saltenv: {{ saltenv }} -{%- if node_name in salt.saltutil.runner('manage.up') %} - Check pillar before salt-minion configuration: salt.function: - name: metalk8s.check_pillar_keys @@ -123,8 +121,6 @@ Wait minion available: - require_in: - http: Wait for API server to be available before highstate -{%- endif %} - {%- if 'etcd' in roles and 'etcd' not in skip_roles %} Check pillar before etcd deployment: