Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 3004rc1: Service states incorrectly report they will restart in test mode #60995

Closed
OrangeDog opened this issue Oct 4, 2021 · 14 comments
Closed
Assignees
Labels
Bug broken, incorrect, or confusing behavior needs-triage Regression The issue is a bug that breaks functionality known to work in previous releases. severity-low 4th level, cosemtic problems, work around exists Silicon v3004.0 Release code name

Comments

@OrangeDog
Copy link
Contributor

Description
Every service.running state, all of which have some sort of watch requisite, reports that it will restart when running in test mode. During an actual apply they correctly do not.

Setup
Both minions have the same highstate (except for hostname/network/etc. parameters).
A selection of the states involved:

salt-minion:
  pkg.installed: []
  service.running:
    - enable: true
    - failhard: true
    - watch:
      - host: salt
      - file: /etc/salt/minion.d/*
      - file: /etc/systemd/system/salt-minion.service.d/*
    - require:
      - pkg: salt-dependencies
systemd-timesyncd:
  service.running:
    - watch:
      - file: /etc/systemd/timesyncd.conf
nginx:
  pkg.installed:
    - pkgs:
      - nginx-full
  service.running:
    - enable: true
    - reload: true
    - require:
      - pkg: nginx
      - sls: csrf.pki.bootstrap

{% for path in salt['cp.list_master'](prefix=tpldir ~ '/conf.d', saltenv=saltenv) %}
{% set filename = salt['file.basename'](path).split('.nginx')[0] ~ '.conf' %}
/etc/nginx/conf.d/{{ filename }}:
  file.managed:
    - source: salt://{{ path }}
    - template: jinja
    - context:
        ciphers: {{ ciphers }}
    - watch_in:
      - service: nginx
{% endfor %}

Steps to Reproduce the behavior

# salt -L 'minion1,minion2' state.test --state-output=changes
minion1:
----------
          ID: salt-minion
    Function: service.running
      Result: None
     Comment: The service salt-minion is set to restart
     Started: 13:20:00.738543
    Duration: 51.177 ms
     Changes:
----------
          ID: ufw-service-running-service-running
    Function: service.running
        Name: ufw
      Result: None
     Comment: The service ufw is set to restart
     Started: 13:20:05.749197
    Duration: 44.425 ms
     Changes:
----------
          ID: openssh
    Function: service.running
        Name: ssh
      Result: None
     Comment: The service ssh is set to restart
     Started: 13:20:06.549077
    Duration: 45.56 ms
     Changes:
----------
          ID: systemd-timesyncd
    Function: service.running
      Result: None
     Comment: The service systemd-timesyncd is set to restart
     Started: 13:20:08.801265
    Duration: 60.522 ms
     Changes:
----------
          ID: zabbix-agent
    Function: service.running
      Result: None
     Comment: The service zabbix-agent is set to restart
     Started: 13:20:09.215579
    Duration: 49.565 ms
     Changes:
----------
          ID: nginx
    Function: service.running
      Result: None
     Comment: The service nginx is set to restart
     Started: 13:20:13.416394
    Duration: 52.345 ms
     Changes:


Summary for minion1
--------------
Succeeded: 188 (unchanged=6)
Failed:      0
--------------
Total states run:     188
Total run time:    13.964 s
minion2:

Summary for minion2
--------------
Succeeded: 188
Failed:      0
--------------
Total states run:     188
Total run time:    14.694 s

Expected behavior
Both minions should report no changes.

Versions Report

minion1 (3004rc1)
Salt Version:
          Salt: 3004rc1

Dependency Versions:
          cffi: 1.14.5
      cherrypy: Not Installed
      dateutil: 2.7.3
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 2.10.1
       libgit2: 1.1.0
      M2Crypto: 0.31.0
          Mako: Not Installed
       msgpack: 0.6.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.20
      pycrypto: Not Installed
  pycryptodome: 3.6.1
        pygit2: 1.6.1
        Python: 3.8.10 (default, Jun  2 2021, 10:49:15)
  python-gnupg: 0.4.5
        PyYAML: 5.3.1
         PyZMQ: 18.1.1
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.2

System Versions:
          dist: ubuntu 20.04 focal
        locale: utf-8
       machine: x86_64
       release: 5.4.0-88-generic
        system: Linux
       version: Ubuntu 20.04 focal
minion2 (3003.3)
Salt Version:
          Salt: 3003.3

Dependency Versions:
          cffi: 1.14.5
      cherrypy: Not Installed
      dateutil: 2.7.3
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 2.10.1
       libgit2: 1.1.0
      M2Crypto: 0.31.0
          Mako: Not Installed
       msgpack: 0.6.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.20
      pycrypto: Not Installed
  pycryptodome: 3.6.1
        pygit2: 1.6.1
        Python: 3.8.10 (default, Jun  2 2021, 10:49:15)
  python-gnupg: 0.4.5
        PyYAML: 5.3.1
         PyZMQ: 18.1.1
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.2

System Versions:
          dist: ubuntu 20.04 focal
        locale: utf-8
       machine: x86_64
       release: 5.4.0-88-generic
        system: Linux
       version: Ubuntu 20.04 focal
@OrangeDog OrangeDog added Bug broken, incorrect, or confusing behavior severity-low 4th level, cosemtic problems, work around exists Regression The issue is a bug that breaks functionality known to work in previous releases. labels Oct 4, 2021
@OrangeDog OrangeDog added the Silicon v3004.0 Release code name label Oct 4, 2021
@garethgreenaway
Copy link
Contributor

@OrangeDog Taking a quick look at this and it doesn't appear that it's related to having the watch requisite. The message is still displayed when it's not included.

@OrangeDog
Copy link
Contributor Author

#60150 looks like a likely candidate @dmurphy18

@dkacar-oradian
Copy link

I'm seeing the same thing with all services on Alma Linux 8 (didn't check with others yet). I have this, for example:

mktemplate_restart_nm:
  service.running:
    - name: NetworkManager.service
    - require:
      - mktemplate_resolver_options
    - watch:
      - mktemplate_resolver_options
    - reload_grains: True

The prerequisite didn't have changes.

@OrangeDog
Copy link
Contributor Author

This problem was reported previously: #60284 (comment)

IMO it's a more wrong now than it was before.

@dmurphy18
Copy link
Contributor

So this problem is being reported in multiple issues but at the time the fix for issue #60120 and PR #60150 were correct for that particular issue.
But it has subsequently high-lighted that there are places where test=True does not report accurately report what will actually happen if executed.

To correctly address some of these situations, really need a method to query systemd for if some specific service B will restart if I restart this service A. As far as I know, there is nothing that lets you quickly walk down dependency chains in systemd, but I am all ears to be corrected. Goggled away and didn't find anything, perhaps someone on this issue might know of a way. Given this is a test operation, it is lower down the pecking order and operational bugs take precedence.

Hence, looking for any ideas from the community as to a best way to address this issue, since blindly assuming dependencies will be restarted too is not appropriate, and I would prefer not to add weasel words to messages.

@garethgreenaway
Copy link
Contributor

garethgreenaway commented Oct 6, 2021

To add some additional color to this issue, running function in the service state module does not handle restarting services, it's simply concerned with enabling and ensuring that the service is running. The mod_watch in the service state handles the situation when a service is watching a file, package, etc. state for changes and those changes occur, that is the function that is handling restarts for services. In the various issues that David mentioned above, the root of the issues in most of them in addition test=True being set, the state in question wasn't returning any changes when test=True. Going forward it would make sense to rollback the changes in PR #60150 and ensuring any states report some indication of what changes will happen when test is set to True.

@dkacar-oradian
Copy link

I'm not 100% sure about what you want to achieve with "need a method to query systemd for if some specific service B will restart if I restart this service A". I assume you want to print all services that would be restarted, but I'm not sure.

For reload the situation is simpler. systemctl show will show PropagatesReloadTo property if such a thing is configured and then you'll have all service names to be reloaded if you follow this recursively. As far as I know there is no other way for reload in one service to trigger reloading another, but I could be wrong. Such a thing can be configured in the unit file by something like:

ReloadPropagatedFrom=oradian-postgres.service

This is from a real example where I also have:

PartOf=oradian-postgres.service

The above means that starts, stops and restarts would also be propagated (Except for bugs. For example, systemd on CentOS 7 won't propagate restart correctly to the grandchild services, but it would propagate start and stop. All of them are propagated correctly for the child services. I don't know if you want to write code which would be able to handle that :-)
In any case you'd have ConsistsOf property with the list of dependencies. And again, follow recursively and you'll get the entire set.

For a simplistic approach I'd implement the proper handling (writing out what would happen) for the named service only plus the things listed above because that's fairly easy. I suppose you'd have to use DBus interface instead of systemctl, but that should be doable. Just note that there are properties CanStart, CanStop and CanReload and if they are false the actual attempt to perform an action will result in an error. It would be nice to get a notification about that from state.test.

I don't know of a simple way to handle dependencies introduced by Wants, Requires and friends. I'd write a feature request for systemd developers.

@OrangeDog
Copy link
Contributor Author

OrangeDog commented Oct 7, 2021

I don't see why any of this is necessary. Salt shouldn't be reporting what systemd is going to to, it should just report what Salt is going to do. The logic is completely contained within the state: whether it's going to do nothing, start a not-running service, or restart a running service.

To correctly address the situation, all you have to do is have the state code, but don't actually do the final action if test=true. Optionally, people seem to like changing the comment to be in the subjuctive.

The initial bug was due to other states not returning changes correctly. It has nothing to do with dependencies inside systemd. Salt shouldn't care about that. The change should probably just be reverted.

@dkacar-oradian
Copy link

dkacar-oradian commented Oct 7, 2021

One other thing. You might be tempted to add a warning about a possibility that other services might be restarted to the return dict from state.test. That might introduce problems for the existing usage. I have a testing program that I'm using to test RCs and new GAs before upgrading and by default it treats all warnings as errors. It has a feature to ignore specific warnings, but that works by specifying warning text verbatim for a particular state in its configuration file.

So if you introduce a warning text that becomes a sort of an API because there's nothing else to differentiate warnings which should be ignored. The test program is using state.apply only, but I might add state.test runs in the future, just to catch problems like this.

It would be better if warning was a dict with some code string that would never change and a message text that could change. This is not a feature request, I'm just letting you know about the existing usage.

@dkacar-oradian
Copy link

I don't see why any of this is necessary.

I don't either, but if somebody wants to deliver more features than strictly necessary I won't object.

@OrangeDog
Copy link
Contributor Author

It's not a feature, it's just wrong. Test mode should only report what the state is going to do, not what some side-effects of some other command might be.

@dkacar-oradian
Copy link

You have a point there.

@Ch3LL Ch3LL added this to the Silicon v3004.0 milestone Oct 7, 2021
@Oloremo
Copy link
Contributor

Oloremo commented Oct 8, 2021

I don't see why any of this is necessary. Salt shouldn't be reporting what systemd is going to to, it should just report what Salt is going to do.

👍

@garethgreenaway
Copy link
Contributor

Can be closed by #61019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior needs-triage Regression The issue is a bug that breaks functionality known to work in previous releases. severity-low 4th level, cosemtic problems, work around exists Silicon v3004.0 Release code name
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants