Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

installed 3007 TypeError: object NoneType can't be used in 'await' expression #66177

Open
z900collector opened this issue Mar 7, 2024 · 40 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior

Comments

@z900collector
Copy link

Upgraded 3006.7 to 3007 on both salt-master and a single salt-minion
Salt minion log is recording the following errors constantly

/var/log/salt/minion
2024-03-07 15:53:49,411 [salt.transport.zeromq:396 ][ERROR ][2127705] Exception while running callback
Traceback (most recent call last):
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
await callback(msg)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression

/var/log/salt/master
2024-03-07 15:54:59,908 [salt.loader.lazy :531 ][ERROR ][1084696] Module/package collision: '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/pycache/vault.cpython-310.pyc' and '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/vault'
2024-03-07 15:54:59,979 [salt.loader.lazy :531 ][ERROR ][1084696] Module/package collision: '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/pycache/vault.cpython-310.pyc' and '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/vault'
2024-03-07 15:54:59,980 [salt.loader.lazy :531 ][ERROR ][1084696] Module/package collision: '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/pycache/vault.cpython-310.pyc' and '/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/vault'

Initially spotted when doing a manual salt-call
looked in /opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/pycache/ and spotted an old file after upgrade:

-rw-r--r-- 1 root root 9832 Mar 7 14:12 user.cpython-310.pyc
-rw-r--r-- 1 root root 440 Mar 7 14:12 value.cpython-310.pyc
-rw-r--r-- 1 root root 14016 Feb 26 09:08 vault.cpython-310.pyc
-rw-r--r-- 1 root root 15913 Mar 7 14:12 verify.cpython-310.pyc
-rw-r--r-- 1 root root 14925 Mar 7 14:12 versions.cpython-310.pyc

Salt Version:
Salt: 3007.0

Python Version:
Python: 3.10.13 (main, Feb 19 2024, 03:31:20) [GCC 11.2.0]

Dependency Versions:
cffi: 1.16.0
cherrypy: 18.8.0
dateutil: 2.8.2
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.3
libgit2: Not Installed
looseversion: 1.3.0
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.7
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 23.1
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.19.1
pygit2: Not Installed
python-gnupg: 0.5.2
PyYAML: 6.0.1
PyZMQ: 25.1.2
relenv: 0.15.1
smmap: Not Installed
timelib: 0.3.0
Tornado: 6.3.3
ZMQ: 4.3.4

Salt Package Information:
Package Type: onedir

System Versions:
dist: oracle 8.9
locale: utf-8
machine: x86_64
release: 5.15.0-101.103.2.1.el8uek.x86_64
system: Linux
version: Oracle Linux Server 8.9

@doesitblend
Copy link
Collaborator

Can you try stopping your Salt master and minion, then run find /opt/saltstack/salt -type f -iname "*.pyc" -print -delete?Then start them back up. I believe that will resolve this issue.

@tjyang
Copy link
Contributor

tjyang commented Mar 7, 2024

@doesitblend , thanks for the oneliner script to cleanup the .pyc.
I am still getting this error message from minion startup. Any suggestion ?

salt-minion[1567838]: [ERROR ] Publish server binding pub to /var/run/salt/minion/minion_event_0f4efbad65_pub.ipc ssl=None

@z900collector
Copy link
Author

Tried the one liner and now the minion reports:

2024-03-07 22:32:33,860 [salt.transport.zeromq:396 ][ERROR ][1756164] Exception while running callback
Traceback (most recent call last):
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
await callback(msg)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression

The master reports (lots and lots of these):
2024-03-07 22:23:13,633 [salt.transport.tcp:311 ][WARNING ][1699260] TCP Publish Client encountered an exception while connecting to /var/run/salt/master/master_event_pub.ipc: StreamClosedError('Stream is closed'), will reconnect in 1 seconds - File "/bin/salt-run", line 11, in
sys.exit(salt_run())

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/scripts.py", line 456, in salt_run
client.run()

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/cli/run.py", line 34, in run
ret = runner.run(full_return=True)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/runner.py", line 303, in run
ret = self._proc_function(

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/client/mixins.py", line 546, in _proc_function
return instance.low(fun, low, full_return=full_return)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/client/mixins.py", line 383, in low
data["return"] = func(*args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 160, in call
ret = self.loader.run(run_func, *args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1233, in run
return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1248, in _run_as
return _func_or_method(*args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/runners/state.py", line 310, in event
return statemod["state.event"](

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 160, in call
ret = self.loader.run(run_func, *args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1233, in run
return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1248, in _run_as
return _func_or_method(*args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/modules/state.py", line 2576, in event
with salt.utils.event.get_event(

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 127, in get_event
return MasterEvent(

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 928, in init
super().init(

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 265, in init
self.connect_pub()

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 323, in connect_pub
self.subscriber = salt.utils.asynchronous.SyncWrapper(

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/asynchronous.py", line 77, in init
self.obj = cls(*args, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 210, in ipc_publish_client
return publish_client(opts, io_loop, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 152, in publish_client
return salt.transport.tcp.PublishClient(

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/tcp.py", line 219, in init
super().init(opts, io_loop, **kwargs)

File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 398, in init
super().init()

@tjyang
Copy link
Contributor

tjyang commented Mar 7, 2024

@z900collector , the one-liner worked for me to remove *.pyc files and I don't have your message above.
I only have following error message from "systemctl status salt-minion" per another ticket #66179

salt-minion[1567838]: [ERROR   ] Publish server binding pub to /var/run/salt/minion/minion_event_0f4efbad65_pub.ipc ssl=None

@doc75
Copy link

doc75 commented Mar 10, 2024

After running the oneliner on both master and minion, it is working again.
Nevertheless I can still see a lot of occurence like this one on minion side:

2024-03-10 19:22:50,833 [salt.transport.zeromq:396 ][ERROR   ][32857] Exception while running callback
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
    await callback(msg)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
    await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression

But it does not prevent the minion from returning properly.

@ZsBT
Copy link

ZsBT commented Mar 10, 2024

I'm facing the same issue, after deleting .pyc files the minions get online. However these error lines still flood the minion logs.

@tjyang
Copy link
Contributor

tjyang commented Mar 10, 2024

rocky9t01a test VM upgraded from 3006.6 to 3007.0.
I thought next version 3006.8 will be latest one with this bug fix but latest one is 3007 without this bug fix.

[me@rocky9t01 salt]$ sudo salt -L rocky9t01 cmd.run "/usr/bin/tail -10 /var/log/salt/minion"
rocky9t01:
    2024-04-09 16:40:43,823 [salt.loaded.int.module.cmdmod:1347][ERROR   ][571240] Command 'systemctl' failed with return code: 1
    2024-04-09 16:40:43,823 [salt.loaded.int.module.cmdmod:1352][ERROR   ][571240] output: Job for salt-minion.service canceled.
    2024-04-09 16:40:44,840 [salt.transport.tcp:1405][ERROR   ][571328] Publish server binding pub to /var/run/salt/minion/minion_event_69362de666_pub.ipc ssl=None
    2024-04-09 16:40:49,491 [salt.transport.zeromq:396 ][ERROR   ][571328] Exception while running callback
    Traceback (most recent call last):
      File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
        await callback(msg)
      File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
        await callback(decoded)
    TypeError: object NoneType can't be used in 'await' expression
[me@rocky9t01 salt]$ 
[me@rocky9t01 salt]$ sudo salt-minion -V Salt Version: Salt: 3007.0

Python Version:
Python: 3.10.13 (main, Feb 19 2024, 03:31:20) [GCC 11.2.0]

Dependency Versions:
cffi: 1.16.0
cherrypy: 18.8.0
dateutil: 2.8.2
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.3
libgit2: Not Installed
looseversion: 1.3.0
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.7
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 23.1
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.19.1
pygit2: Not Installed
python-gnupg: 0.5.2
PyYAML: 6.0.1
PyZMQ: 25.1.2
relenv: 0.15.1
smmap: Not Installed
timelib: 0.3.0
Tornado: 6.3.3
ZMQ: 4.3.4

Salt Package Information:
Package Type: onedir

System Versions:
dist: rocky 9.3 Blue Onyx
locale: utf-8
machine: x86_64
release: 5.14.0-362.18.1.el9_3.0.1.x86_64
system: Linux
version: Rocky Linux 9.3 Blue Onyx

[me@rocky9t01 salt]$

@SwimGeek
Copy link

find /opt/saltstack/salt -type f -iname "*.pyc" -print -delete

This worked, but only for two days, errors returned on three minions so far.

@aloosnetmatch
Copy link

Hi all.

I have the same issues with saltminion 3007.0-0

2024-03-12 10:27:41,901 [salt.transport.tcp:1410][ERROR ][16652] Publish server binding pub to 127.0.0.1:4510 ssl=None

Both on Linux as on windows

@SwimGeek
Copy link

Hi

I see there are comments that this is a 'log flood' issue. I think it's a bit more than that - my minions disconnect from the master.

@tjyang
Copy link
Contributor

tjyang commented Mar 19, 2024

This issue has been opened for two weeks and it is still not gaining enough attention to be classified as bug.

@SwimGeek
Copy link

SwimGeek commented Mar 19, 2024

Hi, my minions seem to become slow to respond and commands seem to have timeouts.

When I delete the .pyc files all is fast and happy. A few days later things go slow and time out again.

Log when minion does not respond, doing basic test.ping:

2024-03-19 14:58:45,810 [salt.transport.zeromq:396 ][ERROR   ][817] Exception while running callback
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
    await callback(msg)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
    await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression
2024-03-19 14:58:55,939 [salt.minion      :2763][ERROR   ][817] Timeout encountered while sending {'cmd': '_return', 'id': 'cpt-ter-blahblah1.atomic.ac', 'success': True, 'return': True, 'retcode': 0, 'jid': '20240319125840713889', 'fun': 'test.ping', 'fun_args': [], 'user': 'root', '_stamp': '2024-03-19T12:58:40.915616', 'nonce': '39f602421d8c4c9bb844bca46a7a8f83'} request

@z900collector
Copy link
Author

z900collector commented Mar 19, 2024 via email

@SwimGeek
Copy link

Update, if I use --async things work ok... so something is causing minions to take very long time to respond.... but they all complete their jobs in the end.

@tjyang
Copy link
Contributor

tjyang commented Mar 22, 2024

In my case , on both 3006.7 and 3007 masters, I need to restart salt-master, just to get master process works.

[me@salt01 ~]$ sudo salt -N testminions test.ping
[ERROR   ] Request client send timedout
Salt request timed out. The master is not responding. You may need to run your command with `--async` in order to bypass the congested event bus. With `--async`, the CLI tool will print the job id (jid) and exit immediately without listening for responses. You can then use `salt-run jobs.lookup_jid` to look up the results of the job in the job cache later.
[me@salt01 ~]$

@themowski
Copy link

We are also seeing TypeError: object NoneType can't be used in 'await' expression in our minion logs after an upgrade from 3006.7 to 3007 on AlmaLinux 8.

@merlinz01
Copy link
Contributor

Am also seeing this error with salt-minion 3007.0 on Debian 12.

@tjyang
Copy link
Contributor

tjyang commented Mar 22, 2024

Anyone has issue of salt-mater not responding to simple test.ping?
my current workaround is restart salt-master and wait for say 60secs and * test.ping will work again(until it doesn't)

@tjyang
Copy link
Contributor

tjyang commented Mar 23, 2024

@dwoz , Thanks for taking on this issue. Do you need more reports to track down the root cause ?

@merlinz01
Copy link
Contributor

Seems to me this is basically what's happening, see lines 473-487 of client.py:

>>> # This function seems to be a decorator for registering callbacks for receiving data
>>> def on_recv(callback):
...     async def wrap_callback(messages):
...         # decode messages
...         await callback('decoded')
...     return wrap_callback
... 
>>> # The decorator is being used on a non-async function that returns None (instead of something awaitable)
>>> @on_recv
... def non_async_func(decoded):
...     print('Non-async function called, returning None')
...     return None
... 
>>> # Data is received, and the callback is called
>>> import asyncio
>>> asyncio.run(non_async_func('messages'))
Non-async function called, returning None
Traceback (most recent call last):
  File "<pyshell#22>", line 1, in <module>
    asyncio.run(non_async_func('messages'))
  File "C:\Program Files\Python312\Lib\asyncio\runners.py", line 194, in run
    return runner.run(main)
  File "C:\Program Files\Python312\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "C:\Program Files\Python312\Lib\asyncio\base_events.py", line 684, in run_until_complete
    return future.result()
  File "<pyshell#18>", line 4, in wrap_callback
    await callback('decoded')
TypeError: object NoneType can't be used in 'await' expression

@merlinz01
Copy link
Contributor

Doing some debugging, I don't see on_recv being used as a decorator, but a salt.minion.Minion().pub_channel, which is a salt.channel.client.AsyncPubChannel, has a on_recv() method that is being called with the Minion's _handle_payload method. The _handle_payload method is not async and always returns None. The wrap_callback function generated in AsyncPubChannel.on_recv() tries to await the result of Minion._handle_payload, which is None.

Would it be as simple as defining salt.minion.Minion._handle_payload as async or wrapping it in an async function?

@KiyoIchikawa
Copy link

I'm not sure about others on this issue, but this issue seems to be a huge blocker. At first, we would get [No Response] and the minion log entries when running salt 'server01' state.apply on the master and just run it again to get it to work. Now, just about every call from the master times out and we see the minion log entries as described above.

Doesn't seem like just a log entry but actually seems to be broken.

@tjyang
Copy link
Contributor

tjyang commented Mar 26, 2024

@KiyoIchikawa , I have to restart salt-master to get the master time out to respond again. Running salt commands again won't fix salt-master issue.
Can you change to log level to debug ? I am doing this and hoping to get the error message in /var/log/salt/master log next time salt-master time out.

@KiyoIchikawa
Copy link

We just restarted our masters (we use the new load balancing feature) and things seem working again. We'll try changing our log level to catch any errors.

@merlinz01
Copy link
Contributor

See my PR #66277 that does away with the errors (so far) for me.

@firdous22
Copy link

Seeing similar issue.
I upgraded both my master and minion to 3007 and even cleared all the .pyc files as mentioned previously.

@tjyang
Copy link
Contributor

tjyang commented Mar 27, 2024

See my PR #66277 that does away with the errors (so far) for me.

Thanks for this PR, I will try it out once my salt masters stable (without time out , not responding to salt commands issues).

@tjyang
Copy link
Contributor

tjyang commented Mar 27, 2024

We just restarted our masters (we use the new load balancing feature) and things seem working again. We'll try changing our log level to catch any errors.

Once I enable log level to debug, in /etc/salt/master, I saw my localfs path has some link files have no destinations. I removed all those bad links.
so far , my salt-master is up for 20h straight without the need to be rebooted due to not responding.

  • sudo systemctl status salt-master |grep Active
 Active: active (running) since Tue 2024-03-26 09:21:42 CDT; 20h ago

@sgbell
Copy link

sgbell commented Apr 2, 2024

HI Guys, just throwing my hat into this, I found that on one of my minions, what was causing it to die anytime I tried to start it up, was a config file in my /etc/salt/minion.d folder. once I removed it, the minion was stable.
The culprit for me was /etc/salt/minion.d/_scheduled.conf with the contents:
schedule: __mine_interval: {enabled: true, function: mine.update, jid_include: true, maxrunning: 2, minutes: 60, return_job: false, run_on_start: true}

@dwoz dwoz added the Bug broken, incorrect, or confusing behavior label Apr 9, 2024
@dwoz
Copy link
Contributor

dwoz commented Apr 9, 2024

I am concerned this issue may not be the root cause of minion's going offline. Can someone please test the changes from #66335 that has been experiencing minion outages? @tjyang @z900collector

@tjyang
Copy link
Contributor

tjyang commented Apr 9, 2024

@dwoz ,

AFAIK from my testing, "... await expression" in salt minion log doesn't impact salt-minion connection issue with salt-master.

I opened another issue to track my salt-master issue that not able to send out commands via 4505 publish port.
when port 4505 malfunction after restart for a while , salt-minion still can pull down formulas.
Pls see #66282 for details.

@colttt
Copy link

colttt commented Apr 15, 2024

any news about that?
a simple state.apply take ages (~5-10minutes), also the load grows to 1.0

this simple run tooks more than 10minutes

salt 'sympa*' state.apply
sympa.my.company.de:
  Name: /etc/bash.bashrc - Function: file.append - Result: Changed - Started: 14:16:39.928980 - Duration: 744540.32 ms
  Name: salt-minion - Function: service.running - Result: Changed - Started: 14:29:17.972168 - Duration: 326.765 ms
  Name: create_host_to_zabbix - Function: zabbix_host.present - Result: Changed - Started: 14:29:18.319065 - Duration: 2267.021 ms
  Name: /etc/apt/keyrings/GPG-KEY-elasticsearch - Function: file.managed - Result: Changed - Started: 14:29:31.059698 - Duration: 44.924 ms
  Name: filebeat - Function: service.running - Result: Changed - Started: 14:29:35.289241 - Duration: 144.684 ms

Summary for sympa.my.company.de
-------------
Succeeded: 76 (changed=5)
Failed:     0
-------------
Total states run:     76
Total run time:  776.464 s

the debug output repeats always:

[DEBUG   ] Using selector: EpollSelector
[DEBUG   ] The functions from module 'jinja' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded jinja.render
[DEBUG   ] The functions from module 'yaml' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded yaml.render
[INFO    ] Starting a new job 20240415122238903939 with PID 126506
[DEBUG   ] The functions from module 'direct_call' are being loaded by dir() on the loaded module
[DEBUG   ] The functions from module 'splay' are being loaded by dir() on the loaded module
[DEBUG   ] Could not LazyLoad direct_call.allow_missing_func: 'direct_call.allow_missing_func' is not available.
[DEBUG   ] The functions from module 'saltutil' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded saltutil.find_job
[INFO    ] Returning information for job: 20240415122238903939
[DEBUG   ] Using selector: EpollSelector
[DEBUG   ] Using selector: EpollSelector
[DEBUG   ] Publisher connecting to /var/run/salt/minion/minion_event_de194ab817_pull.ipc
[DEBUG   ] Minion of 'salt' is handling event tag '__master_req_channel_payload/salt'
[DEBUG   ] Minion return retry timer set to 6 seconds (randomized)
[DEBUG   ] Closing _TCPPubServerPublisher instance
[DEBUG   ] minion return: {'success': True, 'return': {'pid': 126060, 'fun': 'state.apply', 'arg': [], 'tgt': 'sympa*', 'jid': '20240415121633003410', 'ret': '', 'tgt_type': 'glob', 'user': 'sudo_user'}, 'retcode': 0, 'jid': '20240415122238903939', 'fun': 'saltutil.find_job', 'fun_args': ['20240415121633003410'], 'user': 'sudo_user'}
[DEBUG   ] Subprocess ProcessPayload(jid=20240415122238903939) cleaned up
[DEBUG   ] PubChannel received: {'enc': 'aes', 'load': {'fun': 'saltutil.find_job', 'arg': ['20240415121633003410'], 'tgt': ['sympa.my.company.de'], 'jid': '20240415122248933074', 'ret': '', 'tgt_type': 'list', 'user': 'sudo_user'}, 'sig': b'\xab_\xee\xc7;Y\x0c:\xb8\xe4\xa7c.\xf6\xb9ic\x8e\x0ev\xfev\x14y4,\xfe\xd1\x07PBHLV\x069\x82\xacZ\xb3cv\x02t\xf9\xde\xf3\x81G[)m7\xbb\x1f\xd67\x11\xc1\xbe\xc7\xd7\xf4\xf0\x03\xc5\x1ea\xdf\xc0\xba\xd4\x9c(\x18\xcbsM\x9b?\xd0\x7foJ\xe5H\xbapVw\x16\xeb\xd4\xce8\xa2\x1b"\x1e\x04\xa3|Kjn]\xea\xcb5\'\x12\x8a\x88\xc0/\x8c\xd7H\x82\x0c\xe1\x1b_.$\xc3\xbf\xc6i\xad\xd7\xfa\x17\x11\x1f{vV\xf4\x96\x9d\xdd\x87\x84A\x1db\xbdZ\xb6\x84\xc7\xfb\xe0\x0fJ\x18r\x83b\x01\xd4\'\xcf)h\x1cK\xc3\x1c\xfb#>\x1a\xb4[d\xb5#\x83s\x12\xc6\x81\xe9\x95\xe4\xd5@l\x8a\x156} \x8d\xfb\xfe\x85f\xd9\n\x88$b\x82\x8d:\x02\x1f!d\x8f$\xa0\x14\x88\xe6KB\x12\x82\xe6\x1a\xc6\x83\x88\xcc\xf3\x83f\x03\x0b\x0er\x94q\xa9!\x99\x12\xab@+\x9c\xae\xf9N^4\x81\r\xb2\xa8wI'} <bound method Minion._handle_payload of <salt.minion.Minion object at 0x7f0e68b06cb0>>
[INFO    ] User sudo_user Executing command saltutil.find_job with jid 20240415122248933074
[DEBUG   ] Command details {'fun': 'saltutil.find_job', 'arg': ['20240415121633003410'], 'tgt': ['sympa.my.company.de'], 'jid': '20240415122248933074', 'ret': '', 'tgt_type': 'list', 'user': 'sudo_user'}
[DEBUG   ] Subprocess ProcessPayload(jid=20240415122248933074) added
[ERROR   ] Exception while running callback
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
    await callback(msg)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
    await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression

@tuxthepenguin84
Copy link

Please tag this as a high priority bug.

@rimskij
Copy link

rimskij commented Apr 22, 2024

same problem

[salt.transport.zeromq:396 ][ERROR   ][13970] Exception while running callback
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
    await callback(msg)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
    await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression
 [salt.transport.zeromq:396 ][ERROR   ][13970] Exception while running callback
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
    await callback(msg)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
    await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression

@StefanTT
Copy link

StefanTT commented Apr 29, 2024

Even a simple test.ping is affected. Doing the ping from the master to an idle server on the same lan usually takes 0.6 seconds but every other ping takes 6 seconds or even hangs.

mika!2006# time salt 'cal*' test.ping
calvados:
    True

real	0m5.786s
user	0m0.526s
sys	0m0.071s
mika!2006# time salt 'cal*' test.ping
calvados:
    True

real	0m0.639s
user	0m0.500s
sys	0m0.049s
mika!2006# time salt 'cal*' test.ping
calvados:
    True

real	0m5.697s
user	0m0.529s
sys	0m0.066s
mika!2006# time salt 'cal*' test.ping
calvados:
    True

real	0m0.650s
user	0m0.500s
sys	0m0.061s

When the minion becomes unresponsive then stopping salt-minion on it takes very long and does not make a clean stop:

calvados!2039# systemctl stop salt-minion
calvados!2041# pgrep -a salt
76678 /usr/bin/python /usr/bin/salt-minion
79575 /usr/bin/python /usr/bin/salt-minion

So there is obviously something hanging.

calvados!2044# journalctl -fu salt-minion
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: State 'stop-sigterm' timed out. Killing.
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Killing process 76675 (salt-minion) with signal SIGKILL.
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Main process exited, code=killed, status=9/KILL
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Failed with result 'timeout'.
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Unit process 76678 (salt-minion) remains running after unit stopped.
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Unit process 79575 (salt-minion) remains running after unit stopped.
Apr 29 08:15:32 calvados systemd[1]: Stopped The Salt Minion.
Apr 29 08:15:32 calvados systemd[1]: salt-minion.service: Consumed 6.111s CPU time, 112.6M memory peak, 0B memory swap peak.
Apr 29 08:17:43 calvados salt-minion[76678]: Minion received a SIGTERM. Exiting.
Apr 29 08:17:43 calvados salt-minion[76678]: The Salt Minion is shutdown. Minion received a SIGTERM. Exited.

@tjyang
Copy link
Contributor

tjyang commented Apr 29, 2024

@StefanTT

I just want to provide my following test.ping case.

  • I remember, one time, same 15 minions test.ping was close to 1.0 seconds when salt-master first started or when it was initially healthy. Even test.ping the minion on salt master itself only is 5+ secs.
-------------------------------------------
Summary
-------------------------------------------
# of minions targeted: 15
# of minions returned: 15
# of minions that did not return: 0
# of minions with errors: 0
-------------------------------------------

real    0m5.792s
user    0m1.072s
sys     0m0.154s
[me@salt01 ~]$

  • This is on a fresh test server and minion lab, same subnet.
[me@rocky8t01 ~]$ time sudo salt -L rocky8t01,rocky8t02  test.ping
rocky8t01:
    True
rocky8t02:
    True

real    0m1.035s
user    0m0.013s
sys     0m0.012s
[me@rocky8t01 ~]$

@Dr-Bone
Copy link

Dr-Bone commented May 10, 2024

Sorry to be pushy but is there any eta on this defect?
Having a "high" latency connection + this defect renders salt unusable.
Had to downgrade to 3006 because of this, but on arch, python was raised to 3.12 two weeks ago and 3006 has issues with 3.12 so I'm kinda stuck in a one legged limbo right now.
Would really appreciate it if this is tackled rather soon.
That said, I'm happy with a patch or workaround as well in the meantime.

@merlinz01
Copy link
Contributor

I believe this is fixed with #66335 being merged.

@SwimGeek
Copy link

Things improved in 3007.1, but about 10% of minions still randomly have timeout issues.

@tjyang
Copy link
Contributor

tjyang commented Jun 12, 2024

Once I found out one can bring down 3007.x STS server by following command. I downgraded salt-master to LTS version which is much more stable andn less bug

for i in {1..30}; do openssl s_client -connect $1:4506 -tls1_2 </dev/null; sleep
 .2; done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.