Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying Salt states does not apply them to targets other than dom0 #7742

Closed
gonzalo-bulnes opened this issue Sep 7, 2022 · 12 comments · Fixed by QubesOS/qubes-mgmt-salt#34
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: mgmt diagnosed Technical diagnosis has been performed (see issue comments). P: critical Priority: critical. Between "major" and "blocker" in severity. pr submitted A pull request has been submitted for this issue.

Comments

@gonzalo-bulnes
Copy link

gonzalo-bulnes commented Sep 7, 2022

How to file a helpful issue

Qubes OS release

R4.1.1 Fully up-to-date.
I also know of a system that was last updated early July that did not present this behavior. Once updated, it does. (Props: @nathandyer!)

Brief summary

When applying Salt states, they are not applied to targets other than dom0. Instead a single generic state is applied:

ID: topd-always-passes
Function: test.succeed_without_changes
Name: foo
Result: True
Comment: Success!

Relevant states are applied to dom0 as expected.

Steps to reproduce

  1. (Optional but convenient.) Identify a qube for which you've previously applied Salt state. (e.g. ssh-vault)
  2. Apply the Salt states:
    # dom0
    
    sudo qubesctl --skip-dom0 --targets=ssh-vault state.apply
    # ssh-vault: OK
  3. Read the end of the management VM: (e.g. /var/log/qubes/mgmt-ssh-vault.log)
  4. (Optional) Confirm that state are correctly applied to dom0:
    # dom0
    
    sudo qubesctl state.apply
    # local:
    # ----------
    #
    # [... SKIP ...]
    #
    # Summary for local:
    # ----------
    # Suceeded: [N]
    # Failed: 0
    # ----------
    # Total states run:    [N]
    # Total run time:     [X.XXX] s

Expected behavior

In my example, the expected log output is the following:
(The timestamps are real, I ran this same command a few weeks ago.)

2022-07-06 13:57:51,487 calling 'state.apply'...
2022-07-06 13:58:27,579 output: ssh-vault:
2022-07-06 13:58:27,579 output: ----------
2022-07-06 13:58:27,579 output:           ID: /rw/config/split-ssh/ssh-add.desktop
2022-07-06 13:58:27,579 output:     Function: file.managed
2022-07-06 13:58:27,579 output:       Result: True
2022-07-06 13:58:27,579 output:      Comment: File /rw/config/split-ssh/ssh-add.desktop is in the correct state
2022-07-06 13:58:27,579 output:      Started: 13:58:26.885988
2022-07-06 13:58:27,579 output:     Duration: 72.891 ms
2022-07-06 13:58:27,579 output:      Changes:   
2022-07-06 13:58:27,579 output: ----------
2022-07-06 13:58:27,580 output:           ID: ssh-agent-autostart-present
2022-07-06 13:58:27,580 output:     Function: file.append
2022-07-06 13:58:27,580 output:         Name: /rw/config/rc.local
2022-07-06 13:58:27,580 output:       Result: True
2022-07-06 13:58:27,580 output:      Comment: File /rw/config/rc.local is in correct state
2022-07-06 13:58:27,580 output:      Started: 13:58:26.958992
2022-07-06 13:58:27,580 output:     Duration: 25.604 ms
2022-07-06 13:58:27,580 output:      Changes:   
2022-07-06 13:58:27,580 output: ----------
2022-07-06 13:58:27,580 output:           ID: /rw/config/split-ssh/qubes.SSHAgent
2022-07-06 13:58:27,580 output:     Function: file.managed
2022-07-06 13:58:27,580 output:       Result: True
2022-07-06 13:58:27,580 output:      Comment: File /rw/config/split-ssh/qubes.SSHAgent is in the correct state
2022-07-06 13:58:27,580 output:      Started: 13:58:26.984682
2022-07-06 13:58:27,580 output:     Duration: 2.153 ms
2022-07-06 13:58:27,580 output:      Changes:   
2022-07-06 13:58:27,580 output: ----------
2022-07-06 13:58:27,580 output:           ID: rpc-present
2022-07-06 13:58:27,580 output:     Function: file.append
2022-07-06 13:58:27,580 output:         Name: /rw/config/rc.local
2022-07-06 13:58:27,580 output:       Result: True
2022-07-06 13:58:27,580 output:      Comment: File /rw/config/rc.local is in correct state
2022-07-06 13:58:27,580 output:      Started: 13:58:26.986906
2022-07-06 13:58:27,580 output:     Duration: 3.774 ms
2022-07-06 13:58:27,580 output:      Changes:   
2022-07-06 13:58:27,580 output: 
2022-07-06 13:58:27,580 output: Summary for ssh-vault
2022-07-06 13:58:27,581 output: ------------
2022-07-06 13:58:27,581 output: Succeeded: 4
2022-07-06 13:58:27,581 output: Failed:    0
2022-07-06 13:58:27,581 output: ------------
2022-07-06 13:58:27,581 output: Total states run:     4
2022-07-06 13:58:27,581 output: Total run time: 104.422 ms
2022-07-06 13:58:27,581 output: [WARNING ] top_file_merging_strategy is set to 'merge' and multiple top files were found. Merging order is not deterministic, it may be desirable to either set top_file_merging_strategy to 'same' or use the 'env_order' configuration parameter to specify the merging order.
2022-07-06 13:58:27,581 exit code: 0

Actual behavior

Today, the same command results in the following log:

2022-09-07 09:29:24,199 calling 'state.apply'...
2022-09-07 09:29:48,434 output: ssh-vault:
2022-09-07 09:29:48,435 output: ----------
2022-09-07 09:29:48,435 output:           ID: topd-always-passes
2022-09-07 09:29:48,435 output:     Function: test.succeed_without_changes
2022-09-07 09:29:48,435 output:         Name: foo
2022-09-07 09:29:48,435 output:       Result: True
2022-09-07 09:29:48,435 output:      Comment: Success!
2022-09-07 09:29:48,435 output:      Started: 09:29:47.241195
2022-09-07 09:29:48,435 output:     Duration: 0.613 ms
2022-09-07 09:29:48,435 output:      Changes:   
2022-09-07 09:29:48,435 output: 
2022-09-07 09:29:48,435 output: Summary for ssh-vault
2022-09-07 09:29:48,435 output: ------------
2022-09-07 09:29:48,435 output: Succeeded: 1
2022-09-07 09:29:48,435 output: Failed:    0
2022-09-07 09:29:48,435 output: ------------
2022-09-07 09:29:48,435 output: Total states run:     1
2022-09-07 09:29:48,435 output: Total run time:   0.613 ms
2022-09-07 09:29:48,435 output: [WARNING ] top_file_merging_strategy is set to 'merge' and multiple top files were found. Merging order is not deterministic, it may be desirable to either set top_file_merging_strategy to 'same' or use the 'env_order' configuration parameter to specify the merging order.
2022-09-07 09:29:48,435 exit code: 0
@gonzalo-bulnes gonzalo-bulnes added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug labels Sep 7, 2022
@marmarek
Copy link
Member

marmarek commented Sep 7, 2022

Update triggering this issue: https://bodhi.fedoraproject.org/updates/FEDORA-2022-2060a6228d

@eloquence
Copy link

Given the severity, do you have recommendations on how we should advise our users until this issue is resolved upstream? For example, we could instruct folks to downgrade salt in fedora-36 and exclude it from upgrades for now. Of course, if you have a hotfix strategy that can be applied quickly, that may not be necessary.

@gonzalo-bulnes
Copy link
Author

gonzalo-bulnes commented Sep 7, 2022

Thank you @marmarek for the blazing fast answer! I can confirm that downgrading the Salt package in the default-mgmt-vm's template allows to work around the issue (that's fedora-36 in my case). (Credits: @eloquence for the procedure 🙌)

# fedora-36

sudo dnf downgrade salt
# Then shut down the template and wait for a couple of minutes before applying the Salt state again.
Command output. Summary: the package salt and salt-ssh were downgraded to version 3004.
Last metadata expiration check: 1:56:01 ago on Wed Sep  7 08:31:01 2022.
Dependencies resolved.
================================================================================
 Package           Architecture    Version                Repository       Size
================================================================================
Downgrading:
 salt              noarch          3004.1-1.fc36          fedora          9.3 M
 salt-ssh          noarch          3004.1-1.fc36          fedora           14 k

Transaction Summary
================================================================================
Downgrade  2 Packages

Total download size: 9.3 M
Is this ok [y/N]: y
Downloading Packages:
(1/2): salt-ssh-3004.1-1.fc36.noarch.rpm        3.1 kB/s |  14 kB     00:04    
(2/2): salt-3004.1-1.fc36.noarch.rpm            244 kB/s | 9.3 MB     00:38    
--------------------------------------------------------------------------------
Total                                           229 kB/s | 9.3 MB     00:41     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                        1/1 
  Downgrading      : salt-3004.1-1.fc36.noarch                              1/4 
  Downgrading      : salt-ssh-3004.1-1.fc36.noarch                          2/4 
  Cleanup          : salt-ssh-3005-1.fc36.noarch                            3/4 
  Cleanup          : salt-3005-1.fc36.noarch                                4/4 
  Running scriptlet: salt-3005-1.fc36.noarch                                4/4 
  Verifying        : salt-3004.1-1.fc36.noarch                              1/4 
  Verifying        : salt-3005-1.fc36.noarch                                2/4 
  Verifying        : salt-ssh-3004.1-1.fc36.noarch                          3/4 
  Verifying        : salt-ssh-3005-1.fc36.noarch                            4/4 
Notifying dom0 about installed applications

Downgraded:
  salt-3004.1-1.fc36.noarch            salt-ssh-3004.1-1.fc36.noarch           

Complete!

@gonzalo-bulnes
Copy link
Author

The Salt update doesn't seem to be a security update at least:

Changelog (source)

  • Thu Aug 25 2022 Salt Project Packaging [email protected] - 3005-1
    • Update to feature release 3005-1 for Python 3

@marmarek
Copy link
Member

marmarek commented Sep 7, 2022

This seems to be (still) related to saltstack/salt#60003. We have workaround for it, but 3005 includes a (apparently only partial) fix, so the workaround is not applied anymore. I'll adjust the workaround to apply for 3005 too.

Since this bug may prevent installing updates in fedora-36 template, I'll make dom0 apply it, so installing dom0 updates will be enough.

marmarek added a commit to marmarek/qubes-mgmt-salt that referenced this issue Sep 7, 2022
This makes it work even if the bug prevents installing template updates.

QubesOS/qubes-issues#7742
@marmarek
Copy link
Member

marmarek commented Sep 7, 2022

@eloquence @gonzalo-bulnes see linked PR, can you confirm it fixes the issue on your side too? Just applying dom0 change (/usr/lib/python3.8/site-packages/qubessalt/__init__.py) should be enough (that's the point of moving the workaround to dom0).

If that works, I can push it to current-testing repo today, and possibly have accelerated path for moving it to the "current" repo.

@eloquence
Copy link

Thanks Marek for the quick turnaround. As noted on the PR, a couple of us have verified the fix - if it's possible to accelerate the transition to stable, that would be super helpful given the magnitude of the impact.

@andrewdavidwong andrewdavidwong added P: critical Priority: critical. Between "major" and "blocker" in severity. C: mgmt diagnosed Technical diagnosis has been performed (see issue comments). pr submitted A pull request has been submitted for this issue. and removed P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Sep 7, 2022
@andrewdavidwong andrewdavidwong added this to the Release 4.1 updates milestone Sep 7, 2022
@marmarek
Copy link
Member

marmarek commented Sep 7, 2022

I put wrong issue number in the commit message, so notifications are not posted here, but the fix is in the current-testing repository already.

@marmarek
Copy link
Member

marmarek commented Sep 9, 2022

Uploaded to the "current" repo.

@eloquence
Copy link

I don't see it in https://yum.qubes-os.org/r4.1/current/dom0/fc32/rpm/ yet, is that expected or did things get stuck somewhere?

@lubellier
Copy link

I don't see it in

@eloquence : not yet in the current repo but in the current-testing ( https://yum.qubes-os.org/r4.1/current-testing/dom0/fc32/rpm/ ). Also, you can follow the progress of the package (from current-testing to stable) with QubesOS/updates-status#3042 .

@marmarek
Copy link
Member

marmarek commented Sep 9, 2022

Now should be really there.

@andrewdavidwong andrewdavidwong added the affects-4.1 This issue affects Qubes OS 4.1. label Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: mgmt diagnosed Technical diagnosis has been performed (see issue comments). P: critical Priority: critical. Between "major" and "blocker" in severity. pr submitted A pull request has been submitted for this issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants