Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] salt-ssh support for extra_filerefs in Saltfile is broken #60003

Closed
edgan opened this issue Apr 11, 2021 · 25 comments · Fixed by #61014 or #63184
Closed

[BUG] salt-ssh support for extra_filerefs in Saltfile is broken #60003

edgan opened this issue Apr 11, 2021 · 25 comments · Fixed by #61014 or #63184
Assignees
Labels
Bug broken, incorrect, or confusing behavior P2 Priority 2 Regression The issue is a bug that breaks functionality known to work in previous releases. Salt-SSH severity-critical top severity, seen by most users, serious issues Sulfur v3006.0 release code name and version

Comments

@edgan
Copy link
Contributor

edgan commented Apr 11, 2021

Description
I use lots of map.jinjas with salt-ssh, as part of many formulas in my states git repo. Since the files are only imported I need to use extra_filerefs to include them all. I do this in my Saltfile. As of 3003 this doesn't work anymore. If I downgrade to 3002.6, it works.

I thought it might relate to this issue below, but reverting the patch didn't seem to fix it for me. Though I do question the conversion of the type of extra_filerefs from a list to a str. The only reasonable use of extra_filerefs is as a list. I have over 70 map.jinjas, and I need them all.

https://github.com/saltstack/salt/pull/59664/files

Setup
./Saltfile:

salt-ssh:
  config_dir: .
  max_procs: 1
  ssh_wipe: True
  extra_filerefs:
    - salt://_grains/map.jinja

salt-formula/top.sls:

{% from '_grains/map.jinja' import grain with context %}

base:
  '*':
{% if grain.fqdn == 'foo.bar.com' %}
    - base
{% endif %}

Steps to Reproduce the behavior

salt-ssh -i foo.bar.com state.highstate
[ERROR   ] Rendering exception occurred
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/salt/utils/templates.py", line 497, in render_jinja_tmpl
    output = template.render(**decoded_context)
  File "/usr/lib/python3.9/site-packages/jinja2/environment.py", line 1090, in render
    self.environment.handle_exception()
  File "/usr/lib/python3.9/site-packages/jinja2/environment.py", line 832, in handle_exception
    reraise(*rewrite_traceback_stack(source=source))
  File "/usr/lib/python3.9/site-packages/jinja2/_compat.py", line 28, in reraise
    raise value.with_traceback(tb)
  File "<template>", line 1, in top-level template code
  File "/usr/lib/python3.9/site-packages/salt/utils/jinja.py", line 198, in get_source
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: _grains/map.jinja

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/salt/utils/templates.py", line 261, in render_tmpl
    output = render_str(tmplstr, context, tmplpath)
  File "/usr/lib/python3.9/site-packages/salt/utils/templates.py", line 542, in render_jinja_tmpl
    raise SaltRenderError(
salt.exceptions.SaltRenderError: Jinja error: _grains/map.jinja
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/salt/utils/templates.py", line 497, in render_jinja_tmpl
    output = template.render(**decoded_context)
  File "/usr/lib/python3.9/site-packages/jinja2/environment.py", line 1090, in render
    self.environment.handle_exception()
  File "/usr/lib/python3.9/site-packages/jinja2/environment.py", line 832, in handle_exception
    reraise(*rewrite_traceback_stack(source=source))
  File "/usr/lib/python3.9/site-packages/jinja2/_compat.py", line 28, in reraise
    raise value.with_traceback(tb)
  File "<template>", line 1, in top-level template code
  File "/usr/lib/python3.9/site-packages/salt/utils/jinja.py", line 198, in get_source
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: _grains/map.jinja

; line 1

---
{% from '_grains/map.jinja' import grain with context %}    <======================

base:
  '*':
{% if grain.fqdn == 'foo.bar.com' %}
    - base    
[...]
---
[ERROR   ] Unable to render top file: Jinja error: _grains/map.jinja
foo.bar.com:

Summary for foo.bar.com
-----------
Succeeded: 0
Failed:   0
-----------
Total states run:    0
Total run time:  0.000 ms

Expected behavior
It to include the map.jinja in the tgz, and not throw an error about not finding it.

Versions Report

salt-ssh --versions-report
Salt Version:
          Salt: 3003

Dependency Versions:
          cffi: 1.14.1
      cherrypy: Not Installed
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 2.11.3
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: 1.1.3
       msgpack: 1.0.0
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.20
      pycrypto: 2.6.1
  pycryptodome: 3.10.1
        pygit2: Not Installed
        Python: 3.9.2 (default, Feb 20 2021, 00:00:00)
  python-gnupg: Not Installed
        PyYAML: 5.4.1
         PyZMQ: 22.0.3
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: fedora 33
        locale: utf-8
       machine: x86_64
       release: 5.11.10-200.fc33.x86_64
        system: Linux
       version: Fedora 33
@edgan edgan added Bug broken, incorrect, or confusing behavior needs-triage labels Apr 11, 2021
@OrangeDog OrangeDog added Salt-SSH Regression The issue is a bug that breaks functionality known to work in previous releases. labels Apr 11, 2021
@sagetherage sagetherage assigned dmurphy18 and unassigned dmurphy18 Apr 12, 2021
@sagetherage sagetherage added this to the Approved milestone Apr 12, 2021
@sagetherage sagetherage added the severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around label Apr 12, 2021
@dmurphy18
Copy link
Contributor

@edgan Can you provide more information to reproduce this is since I have been able to replicate the issue using Salt 3002.2 with Centos 7 for master and minion. I may have a misconfiguration causing similar response.

; line 1

---
{% from '_grains/map.jinja' import grain with context %}    <======================

base:
  '*':
{% if grain.fqdn == 'foo' %}
    - base   
[...]
---
[CRITICAL] Pillar render error: Rendering SLS 'base' failed. Please see master log for details.
foo:
    ----------
root@Unknown:/srv/salt# salt-run --versions-report
Salt Version:
          Salt: 3002.2
 
Dependency Versions:
          cffi: 1.14.4
      cherrypy: Not Installed
      dateutil: Not Installed
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 2.8.1
       libgit2: Not Installed
      M2Crypto: 0.33.0
          Mako: Not Installed
       msgpack: 0.6.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.20
      pycrypto: Not Installed
  pycryptodome: Not Installed
        pygit2: Not Installed
        Python: 3.6.8 (default, Apr  2 2020, 13:34:55)
  python-gnupg: Not Installed
        PyYAML: 5.4.1
         PyZMQ: 17.0.0
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.1.4
 
System Versions:
          dist: centos 7 Core
        locale: UTF-8
       machine: x86_64
       release: 3.10.0-1127.18.2.el7.x86_64
        system: Linux
       version: CentOS Linux 7 Core

With top.sls

root@Unknown:/srv/salt# cat /srv/pillar/top.sls 

base:
    '*':
     - base

and /srv/pillar/base.sls

root@Unknown:/srv/salt# cat /srv/pillar/base.sls 
{% from '_grains/map.jinja' import grain with context %}

base:
  '*':
{% if grain.fqdn == 'foo' %}
    - base   
{% endif %}

with roster

root@Unknown:/srv/salt# cat /srv/salt/roster
foo:
  host: 192.168.0.124
  user: david
  passwd: dogkat2015
#  sudo: True

with _grains/map.jinja

root@Unknown:/srv/salt# cat /srv/salt/_grains/map.jinja 

# set version to build
{% set build_version = '3000_2' %}


{% set grain.fqdn = 'foo' %}

root@Unknown:/srv/salt#

and Saltfile

root@Unknown:/srv/salt# cat ~/.salt/Saltfile 
salt-ssh:
  config_dir: .
  ssh_log_file: /var/log/salt/salt-ssh.log
  max_procs: 1
  ssh_wipe: True
  extra_filerefs:
    - salt://_grains/map.jinja

@edgan
Copy link
Contributor Author

edgan commented Apr 16, 2021

I have attached the bare minimum version of my setup. This fails in the same way with 3003 and works with 3002.6.

Note I do have to run rm -rf /var/tmp/.user_ab4926_salt/ /var/tmp/salt-ssh/* when I switch versions, because it will cache the old version.

No pillars
One map.jinja
Super basic base formula that installs curl
Minimal files

salt-formulas.tar.gz

@sagetherage sagetherage modified the milestones: Approved, Silicon Apr 20, 2021
@sagetherage sagetherage added Silicon v3004.0 Release code name severity-critical top severity, seen by most users, serious issues and removed severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around labels Apr 20, 2021
@EliAndrewC
Copy link

FWIW, we've run into the same issue and have had to stay on salt 3002 as a result. I've tried to find some other workarounds but for now we'll just avoid upgrading to 3003 until there's a fix or someone figures out a good workaround.

@marmarek
Copy link
Contributor

marmarek commented May 7, 2021

I believe I have very similar issue. My minimal test case:

# tree -s /srv
/srv
└── [       4096]  salt
    ├── [          0]  empty.sls
    ├── [          0]  map.jinja
    └── [         54]  top.sls
# cat top.sls
{% import "map.jinja" as f %}

base:
 '*':
   - empty

# cat /etc/salt/master.d/extra.conf 
extra_filerefs:
  - salt://map.jinja
# salt-ssh localhost state.show_top 
[ERROR   ] Rendering exception occurred
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/salt/utils/templates.py", line 497, in render_jinja_tmpl
    output = template.render(**decoded_context)
  File "/usr/lib/python3.9/site-packages/jinja2/environment.py", line 1090, in render
    self.environment.handle_exception()
  File "/usr/lib/python3.9/site-packages/jinja2/environment.py", line 832, in handle_exception
    reraise(*rewrite_traceback_stack(source=source))
  File "/usr/lib/python3.9/site-packages/jinja2/_compat.py", line 28, in reraise
    raise value.with_traceback(tb)
  File "<template>", line 1, in top-level template code
  File "/usr/lib/python3.9/site-packages/salt/utils/jinja.py", line 198, in get_source
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: map.jinja

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/salt/utils/templates.py", line 261, in render_tmpl
    output = render_str(tmplstr, context, tmplpath)
  File "/usr/lib/python3.9/site-packages/salt/utils/templates.py", line 542, in render_jinja_tmpl
    raise SaltRenderError(
salt.exceptions.SaltRenderError: Jinja error: map.jinja
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/salt/utils/templates.py", line 497, in render_jinja_tmpl
    output = template.render(**decoded_context)
  File "/usr/lib/python3.9/site-packages/jinja2/environment.py", line 1090, in render
    self.environment.handle_exception()
  File "/usr/lib/python3.9/site-packages/jinja2/environment.py", line 832, in handle_exception
    reraise(*rewrite_traceback_stack(source=source))
  File "/usr/lib/python3.9/site-packages/jinja2/_compat.py", line 28, in reraise
    raise value.with_traceback(tb)
  File "<template>", line 1, in top-level template code
  File "/usr/lib/python3.9/site-packages/salt/utils/jinja.py", line 198, in get_source
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: map.jinja

; line 1

---
{% import "map.jinja" as f %}    <======================

base:
 '*':
   - empty

[...]
---
[ERROR   ] Unable to render top file: Jinja error: map.jinja
localhost:
    ----------

After debugging it a bit, I've found it is about cached FileClient instance. I've added the following debug/workaround patch:

--- /usr/lib/python3.9/site-packages/salt/utils/jinja.py.orig	2021-05-07 12:08:45.897000000 +0200
+++ /usr/lib/python3.9/site-packages/salt/utils/jinja.py	2021-05-07 12:08:53.543000000 +0200
@@ -99,6 +99,11 @@ class SaltCacheLoader(BaseLoader):
         if self._file_client is None:
             attr = "_cached_pillar_client" if self.pillar_rend else "_cached_client"
             cached_client = getattr(self, attr, None)
+            if cached_client is not None and cached_client.opts['cachedir'] != self.opts['cachedir']:
+                log.error('Cached FileClient instance has mismatching cachedir: %s != %s',
+                        cached_client.opts['cachedir'], self.opts['cachedir'])
+                cached_client = None
+
             if cached_client is None:
                 cached_client = salt.fileclient.get_file_client(
                     self.opts, self.pillar_rend

And it works, and shows what's the issue at the same time:

# salt-ssh localhost state.show_top 
[ERROR   ] Cached FileClient instance has mismatching cachedir: /var/cache/salt/master != /var/tmp/.root_dd8a91_salt/running_data/var/cache/salt/minion
localhost:
    ----------
    base:
        - empty

That said, I have no idea why this happens (especially - why cachedir changes at some point, still within master context) and what is the proper fix. Any help would be appreciated.

salt-ssh --versions-report
bash-5.0# salt-ssh --versions-report
Salt Version:
          Salt: 3003
 
Dependency Versions:
          cffi: 1.14.1
      cherrypy: Not Installed
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 2.11.3
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.0
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.20
      pycrypto: Not Installed
  pycryptodome: 3.10.1
        pygit2: Not Installed
        Python: 3.9.4 (default, Apr  6 2021, 00:00:00)
  python-gnupg: 0.4.6
        PyYAML: 5.4.1
         PyZMQ: 19.0.0
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4
 
System Versions:
          dist: fedora 33 
        locale: utf-8
       machine: x86_64
       release: 5.4.98-1.fc25.qubes.x86_64
        system: Linux
       version: Fedora 33 

@sagetherage sagetherage added the P2 Priority 2 label May 7, 2021
marmarek added a commit to marmarek/qubes-mgmt-salt that referenced this issue May 9, 2021
Unbreak salt using dirty hack, until upstream applies the proper fix.
This is executed in a DispVM, so the change is not persistent - when
upstream fix will be there, it's enough to simply drop this line.

Workaround for saltstack/salt#60003

Fixes QubesOS/qubes-issues#6580
marmarek added a commit to QubesOS/qubes-mgmt-salt that referenced this issue May 11, 2021
Unbreak salt using dirty hack, until upstream applies the proper fix.
This is executed in a DispVM, so the change is not persistent - when
upstream fix will be there, it's enough to simply drop this line.

Workaround for saltstack/salt#60003

Fixes QubesOS/qubes-issues#6580

(cherry picked from commit 5d3b029)
@waynegemmell
Copy link
Contributor

waynegemmell commented Jul 9, 2021

I think I've bumped into this today. Applying the fix above broke imports. Is it going to get fixed soon?

@max-arnold
Copy link
Contributor

max-arnold commented Jul 10, 2021

I added the following snippet at the beginning of the salt/client/ssh/wrapper/state.py:_merge_extra_filerefs() function:

    print('ARGS', args)
    import traceback
    traceback.print_stack()

There are two observations:

  1. Both kwargs.get("extra_filerefs", "") and __opts__.get("extra_filerefs", "") are empty (ARGS ('', '') is printed)
  2. The stracktraces between 3002.6 and 3003.1 differ in the following way (some line numbers are masked to avoid displaying unrelated changes):
--- /tmp/trace3002-6.txt	2021-07-10 11:48:11.000000000 +0700
+++ /tmp/trace3003-1.txt	2021-07-10 11:48:14.000000000 +0700
@@ -1,4 +1,4 @@
-ARGS ('', 'salt://filerefs/map.jinja')
+ARGS ('', '')
   File "/Users/user/.virtualenvs/salt3003/bin/salt-ssh", line 7, in <module>
     exec(compile(f.read(), __file__, 'exec'))
   File "/Users/user/repos/saltstack-repo/scripts/salt-ssh", line 9, in <module>
@@ -33,6 +33,14 @@
     stdout, retcode = self.run_wfunc()
   File "/Users/user/repos/saltstack-repo/salt/client/ssh/__init__.py", line XXXX, in run_wfunc
     result = self.wfuncs[self.fun](*self.args, **self.kwargs)
+  File "/Users/user/repos/saltstack-repo/salt/loader.py", line 1241, in __call__
+    return self.loader.run(run_func, *args, **kwargs)
+  File "/Users/user/repos/saltstack-repo/salt/loader.py", line 2274, in run
+    return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
+  File "/Users/user/.virtualenvs/salt3003/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
+    return callable(*args, **kwargs)
+  File "/Users/user/repos/saltstack-repo/salt/loader.py", line 2289, in _run_as
+    return _func_or_method(*args, **kwargs)
   File "/Users/user/repos/saltstack-repo/salt/client/ssh/wrapper/state.py", line XXX, in apply_
     return sls(mods, **kwargs)
   File "/Users/user/repos/saltstack-repo/salt/client/ssh/wrapper/state.py", line XXX, in sls

I also bisected the Git history down to the following two pull requests:

The following quick and dirty patch for 3003.1 fixes just the --extra-filerefs option (or the extra_filerefs Saltfile setting):

diff --git a/salt/client/ssh/__init__.py b/salt/client/ssh/__init__.py
index b6bf987117..952a943a92 100644
--- a/salt/client/ssh/__init__.py
+++ b/salt/client/ssh/__init__.py
@@ -1196,6 +1196,9 @@ def run_wfunc(self):
                 opts["grains"][grain] = self.target["grains"][grain]

         opts["pillar"] = data.get("pillar")
+        extra_filerefs = opts.get("__master_opts__", {}).get("extra_filerefs")
+        if extra_filerefs:
+            opts["extra_filerefs"] = extra_filerefs
         wrapper = salt.client.ssh.wrapper.FunctionWrapper(
             opts,
             self.id,

This is definitely not the right fix, but I hope it can unblock your workflows. A proper one requires thorough understanging of Loader Contexts and how options are passed between those. The only thing I can do is ping @dwoz to help with that.

I also have another patch to avoid specifying an --extra-filerefs option for each map.jinja file you have in your state tree: #31531 (comment)

@max-arnold
Copy link
Contributor

max-arnold commented Jul 10, 2021

And I strongly suggest making a proper fix not just in Silicon, but also backporting it to Aluminium (3003.2). This is a significant regression and it doesn't make sense to leave 3003.x series broken while it is still supported.

@alxwr
Copy link
Contributor

alxwr commented Jul 28, 2021

FWI: This is the patch I use since v3002.1:

diff --git a/salt/client/ssh/wrapper/state.py b/salt/client/ssh/wrapper/state.py
index 1a13af5eda..db8d2ab0c5 100644
--- a/salt/client/ssh/wrapper/state.py
+++ b/salt/client/ssh/wrapper/state.py
@@ -5,6 +5,7 @@ Create ssh executor system
 import logging
 import os
 import time
+from copy import deepcopy
 
 import salt.client.ssh.shell
 import salt.client.ssh.state
@@ -662,15 +663,20 @@ def highstate(test=None, **kwargs):
     st_kwargs = __salt__.kwargs
     __opts__["grains"] = __grains__.value()
     opts = salt.utils.state.get_sls_opts(__opts__, **kwargs)
+    highstate_opts = deepcopy(opts)
+    highstate_opts["extra_filerefs"] = opts["__master_opts__"].get("extra_filerefs", "")
+    highstate_opts["cachedir"] = opts["__master_opts__"].get("cachedir", "")
     st_ = salt.client.ssh.state.SSHHighState(
-        opts, __pillar__.value(), __salt__.value(), __context__["fileclient"]
+        highstate_opts, __pillar__.value(), __salt__.value(), __context__["fileclient"]
     )
     st_.push_active()
     chunks = st_.compile_low_chunks()
     file_refs = salt.client.ssh.state.lowstate_file_refs(
         chunks,
         _merge_extra_filerefs(
-            kwargs.get("extra_filerefs", ""), opts.get("extra_filerefs", "")
+            kwargs.get("extra_filerefs", ""),
+            opts.get("extra_filerefs", ""),
+            opts.get("__master_opts__", {}).get("extra_filerefs", "")
         ),
     )
     # Check for errors
-- 
2.25.1

Had to modify it a bit for v3003.1. Tested it. Works.
I don't know whether this is the right way to do it.

[edit]I don't use Saltfile, but experienced the same problems.

If someone verifies that this patch fixes the issues, I'd be happy to open a PR.[/edit]

@marmarek
Copy link
Contributor

marmarek commented Sep 7, 2022

Salt 3005 adds check if file_roots is correct in the cached file client, but that isn't enough. At least cachedir needs to be checked too (for some reason, salt-ssh will construct fileclient with cachedir set to /var/tmp/...._salt already). I believe creating fileclient with such cachedir on the master side is a bug, but I don't know salt internals enough to debug it.
FWIW #61895 does not fix top.sls rendering for me (includes there fail with TemplateNotFound error).

@alxwr
Copy link
Contributor

alxwr commented Oct 31, 2022

FYI: I updated my patch to work with 3004

[…]

#61895 makes my patch obsolete. Thanks a lot! @kaszuba

[Edit:] Seems i ran into a different error. 3005 makes my patch obsolete.

@Ch3LL Ch3LL assigned Ch3LL and unassigned dwoz Nov 8, 2022
marmarek added a commit to marmarek/salt that referenced this issue Nov 17, 2022
salt-ssh constructs file client with cachedir set to /var/tmp/...
already (as it would be on the minion side). Make sure it isn't used
when looking for files on the master (with cachedir /var/cache/salt or
similar), as it wouldn't find them.

related to saltstack#60003
marmarek added a commit to marmarek/salt that referenced this issue Nov 17, 2022
This makes it detect cases where remote/local paths gets confused (like
in saltstack#60003 or saltstack#62043). Without using a container, remote paths works
locally too, so tests do not detect an issue.

Rename test_ssh_setup.py to test_ssh_setup_and_state.py to match its new
content.
pytest.mark.skip_on_windows is dropped as test_ssh_setup.py already has
necessary mark (dockerd present).
@Ch3LL
Copy link
Contributor

Ch3LL commented Dec 2, 2022

Here is the fix if anyone wants to test: #63184

@jpmckinney
Copy link
Contributor

I think maybe this already works now? I just tried on 3005.1.

@Ch3LL
Copy link
Contributor

Ch3LL commented Dec 5, 2022

Its definitely broken in the example the original submitter provided. It does work in certain scenarios though, so I'm guessing your use case fits that.

@jpmckinney
Copy link
Contributor

Ah, yeah, I'm not doing import/include in top, but in other files, so I guess it works there now, at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior P2 Priority 2 Regression The issue is a bug that breaks functionality known to work in previous releases. Salt-SSH severity-critical top severity, seen by most users, serious issues Sulfur v3006.0 release code name and version
Projects
None yet