-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] 3007 master 4505 publish port failed to respond to salt commands unless restart #66282
Comments
@KiyoIchikawa, do you still have this issue ? |
/var/log/salt/master log with trace level enabled: after running sudo salt salt01 test.ping command it just time out on itsef , the master
/var/log/salt/master log with trace level enabled: after running sudo salt-call test.ping command. salt master is answering querying.``` 2024-03-28 08:43:33,533 [salt.crypt ][DEBUG ] salt.crypt.get_rsa_key: Loading private key 2024-03-28 08:43:33,533 [salt.crypt ][DEBUG ] salt.crypt.sign_message: Signing message. 2024-03-28 08:43:33,533 [salt.transport.tcp][TRACE ] TCP PubServer sending payload: topic_list=None 'salt/auth\n\n��resultãact�accept�id�salt01�pub�\x01�-----BEGIN PUBLIC KEY-----\nMIIBIjANBgkqhkiG9xxxxx-masked--Z7HIgb+x+dyYNDe\nb3RFpYx7QFxzHmShDutLMyjfAR26npS8fsN1+nHcoXTkHC6kE2OjC0yc/rp5Crr6\nIQIDAQAB\n-----END PUBLIC KEY-----�_stamp�2024-03-28T13:43:33.532394' 2024-03-28 08:43:33,533 [salt.transport.tcp][TRACE ] TCP PubServer finished publishing payload 2024-03-28 08:43:33,533 [salt.master ][TRACE ] Ignore tag salt/auth 2024-03-28 08:43:33,581 [salt.master ][TRACE ] AES payload received with command _pillar 2024-03-28 08:43:33,581 [salt.pillar ][DEBUG ] Determining pillar cache 2024-03-28 08:43:33,600 [salt.loader.lazy ][DEBUG ] The functions from module 'roots' are being loaded by dir() on the loaded module 2024-03-28 08:43:33,600 [salt.utils.lazy ][DEBUG ] LazyLoaded roots.envs 2024-03-28 08:43:33,602 [salt.loader.lazy ][DEBUG ] The functions from module 's3fs' are being loaded by dir() on the loaded module 2024-03-28 08:43:33,603 [salt.utils.lazy ][DEBUG ] Could not LazyLoad roots.init: 'roots.init' is not available. 2024-03-28 08:43:33,623 [salt.loader.lazy ][DEBUG ] The functions from module 'jinja' are being loaded by dir() on the loaded module 2024-03-28 08:43:33,623 [salt.utils.lazy ][DEBUG ] LazyLoaded jinja.render 2024-03-28 08:43:33,624 [salt.loader.lazy ][DEBUG ] The functions from module 'yaml' are being loaded by dir() on the loaded module 2024-03-28 08:43:33,624 [salt.utils.lazy ][DEBUG ] LazyLoaded yaml.render 2024-03-28 08:43:33,631 [salt.master ][TRACE ] Master function call _pillar took 0.05068349838256836 seconds 2024-03-28 08:43:33,632 [salt.transport.tcp][TRACE ] TCP PubServer sending payload: topic_list=None 'minion/refresh/salt01\n\n��Minion data cache refresh�salt01�_stamp�2024-03-28T13:43:33.630948' 2024-03-28 08:43:33,632 [salt.transport.tcp][TRACE ] TCP PubServer finished publishing payload 2024-03-28 08:43:33,632 [salt.master ][TRACE ] Ignore tag minion/refresh/salt01 2024-03-28 08:43:33,785 [salt.master ][TRACE ] AES payload received with command _return 2024-03-28 08:43:33,787 [salt.loaded.int.returner.local_cache][DEBUG ] Adding minions for job 20240328134333786284: ['salt01'] 2024-03-28 08:43:33,788 [salt.utils.job ][INFO ] Got return from salt01 for job 20240328134333786284 2024-03-28 08:43:33,789 [salt.transport.tcp][TRACE ] TCP PubServer sending payload: topic_list=None 'salt/job/20240328134333786284/ret/salt01\n\n��cmd�_return�id�salt01�jid�20240328134333786284�returnçretcode\x00�fun�test.ping�fun_args��arg��tgt_type�glob�tgt�salt01�_stamp�2024-03-28T13:43:33.788313' 2024-03-28 08:43:33,789 [salt.loaded.int.returner.local_cache][DEBUG ] Reading minion list from /var/cache/salt/master/jobs/1b/c46d64efe0c85f68fb92d7c9137b70db8237f991f39268f4018c423a3aa009/.minions.p 2024-03-28 08:43:33,789 [salt.transport.tcp][TRACE ] TCP PubServer finished publishing payload 2024-03-28 08:43:33,789 [salt.master ][TRACE ] Ignore tag salt/job/20240328134333786284/ret/salt01 2024-03-28 08:43:33,789 [salt.master ][TRACE ] Master function call _return took 0.003786802291870117 seconds 2024-03-28 08:43:34,247 [salt.utils.process][TRACE ] Process manager iteration
|
We tested on Windows minions and Linux minions. The Windows minions seem to be working better, but they both seem to be getting the following error(s) while applying a Linux Minion (OEL 8)
Windows (Windows Server 2016 Datacenter)
The Linux minion keeps getting timeouts, I have not increased the log-level on the master yet. |
@KiyoIchikawa , thanks for the update. your case above doesn't fit the issue I am facing here. When salt-master needed restart, it can't even issue command itself(sudo salt saltmaster01 test.ping" . And yet "saltmaster01$sudo salt-call test.ping" works. |
Here is more info when salt commands can't be sent out.
|
Small comment, we're facing the exact same issue since upgrading from 3006.7. Restarting it doesn't fix anything as it cannot bind to its port:
EDIT:
|
|
We're already running a 3 nodes multi-masters. I'm considering downgrading to 3006.7 |
I think I tried downgrade to 3006.7 once and it didn't help this issue so I decided use 3007.0 again. if downloading to older version approach is adopted the minion need to be downgraded also. |
I am following this PR: #66335 , not really know the details. but this issue looks like a publish port issue that 66335 trying to fix. |
|
|
my 2nd salt-master failed to test.ping itself, restart salt-master doesn't work anymore.
|
After downgrade from 3007 to 3006.8 version, my salt-master servers, the 4505 port is now stable and able to send out salt commands. |
I've done the same as this error was preventing our app to automatically add VMs to cope with the load. Also, I didn't want to set a daily restart to "bypass" the issue. |
@compi-tom, Thanks for the confirmation. Hopeful when 3007 turned into LTS from STS. This issue can be acked by core team and resolved. |
Any update on this issue ? I had two servers that were working as multimasters nodes thru multiple 3006 versions, just upgraded to 3007.1 and I get this problem in one of the servers only
|
We also had to downgrade our master to 3006.8 to resolve this issue. No logs are written with Loglevel WARNING. Master just refuses to connect to any minion (even itself). Salt-Call works without issues. |
Hello, Today i made update from latest 3005 to 3007.1 and i got a lot of messages in master log file: 2024-08-25 10:10:10,192 [salt.transport.tcp:1102][DEBUG ][81648] Subscriber at connected 2024-08-25 10:10:10,193 [salt.transport.tcp:1082][DEBUG ][81648] tcp stream to closed, unable to recv 2024-08-25 10:10:10,193 [salt.transport.tcp:1102][DEBUG ][81648] Subscriber at connected 2024-08-25 10:10:15,199 [salt.transport.tcp:1082][DEBUG ][81648] tcp stream to closed, unable to recv 2024-08-25 10:10:15,199 [salt.transport.tcp:1102][DEBUG ][81648] Subscriber at connected 2024-08-25 10:10:15,200 [salt.transport.tcp:1082][DEBUG ][81648] tcp stream to closed, unable to recv 2024-08-25 10:10:15,201 [salt.transport.tcp:1102][DEBUG ][81648] Subscriber at connected It seems 3007.1 is not stable and we cannot use it because communication with minions does not work (we have about 2500 minions connected to the master) Just for information that problem still exists. Any fix around? |
Description
3007 master not responding salt commands unless restarting the master. But it does answer salt-call ran from minion
The timeout issue does not have log in /var/log/salt/master file with log file level set to debug.
Currently trying the trace level.
Setup
(Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)
Please be as specific as possible and give set-up details.
Steps to Reproduce the behavior
$sudo salt minion-on-saltmaster test.ping
Expected behavior
salt-master should answer the test.ping command quickly. In stead,it timeout.
Screenshots
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: