Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple targeting results to Salt request timed out since upgrade to 2019.2.1 #55331

Closed
hmalinov opened this issue Nov 15, 2019 · 2 comments
Closed

Comments

@hmalinov
Copy link

hmalinov commented Nov 15, 2019

Description of Issue

After an upgrade from 2017.7.2 to 2019.2.1 we notice a high CPU load when executing simple modules like : salt '*' test.ping or grains.items which often results to errors.

  with `--async` in order to bypass the congested event bus. With `--async`, the CLI
  tool will print the job id (jid) and exit immediately without listening for responses.
  You can then use `salt-run jobs.lookup_jid` to look up the results of the job in
  the job cache later.

On the minion there is a similar error:

[salt.minion      :1990][WARNING ][7596] The minion failed to return the job information for job 20191115151501857731. This is often due to the master being shut down or overloaded. If the master is running, consider increasing the worker_threads value.

Setup

We have a single master host with around 450 minions all RHEL 7 and CentOS 7.
The master has 8 CPUs ad 8 GB of memory

Salt master configuration:

fileserver_backend:
  - git
gitfs_remotes:
  - git@gitlab_server:gcs/saltstack.git:
    - pubkey: /root/.ssh/id_rsa.pub
    - privkey: /root/.ssh/id_rsa
    - root: data/salt
    - saltenv:
      - TEST:
        - ref: TEST
      - UAT:
        - ref: UAT
      - STABLE:
        - ref: STABLE
  - git@gitlab_server:gcs/saltstack.git:
    - name: formulas
    - pubkey: /root/.ssh/id_rsa.pub
    - privkey: /root/.ssh/id_rsa
    - root: data/formula
    - saltenv:
      - TEST:
        - ref: TEST
      - UAT:
        - ref: UAT
      - STABLE:
        - ref: STABLE
gitfs_env_whitelist:
  - STABLE
  - UAT
  - TEST
ext_pillar:
  - git:
    - STABLE git@gitlab_server:gcs/saltstack.git:
      - pubkey: /root/.ssh/id_rsa.pub
      - privkey: /root/.ssh/id_rsa
      - env: STABLE
    - UAT git@gitlab_server:gcs/saltstack.git:
      - pubkey: /root/.ssh/id_rsa.pub
      - privkey: /root/.ssh/id_rsa
      - env: UAT
    - TEST git@gitlab_server:gcs/saltstack.git:
      - pubkey: /root/.ssh/id_rsa.pub
      - privkey: /root/.ssh/id_rsa
      - env: TEST
pillar_source_merging_strategy: recurse
git_pillar_provider: pygit2
git_pillar_root: 'data/pillar'
log_file: /var/log/salt/master
log_level_logfile: debug
log_granular_levels:
  'salt': 'info'
  'salt.utils.job': 'debug'
  'salt.master': 'debug'
return: syslog

Steps to Reproduce Issue

When I run below commands, the salt processes goes up to 100% CPU usage .

salt '*' grains.items around 40-50s
salt '*' test.ping around 40-50s

Versions Report

salt --versions-report

           Salt: 2019.2.1
 
Dependency Versions:
           cffi: 1.6.0
       cherrypy: unknown
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: 0.26.3
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.6
   mysql-python: Not Installed
      pycparser: 2.14
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: 0.26.4
         Python: 2.7.5 (default, Aug  7 2019, 00:51:29)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4
 
System Versions:
           dist: centos 7.7.1908 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-693.el7.x86_64
         system: Linux
        version: CentOS Linux 7.7.1908 Core```
@bitskri3g
Copy link

You're probably running in to this: #54941

Also, 2019.2.1 was pulled due to instability. You should probably go to 2019.2.2 (it was fixed before it was released), or backport that specific change on your own if you can't for some reason.

@hmalinov
Copy link
Author

hmalinov commented Dec 5, 2019

Thank you @bitskri3g , the upgrade to 2019.2.2 indeed solved the problem.

@hmalinov hmalinov closed this as completed Dec 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants