All of the returners (excluding mysql and local_cache) are not handling the "req" job id correctly #51809

arizvisa · 2019-02-25T21:39:17Z

Description of Issue/Question

Hiya. I'm the author of PR #51363 (etcd returner), and I notice that the mysql returner specially handles the case when the job id is set to "req". This job id seems to be submitted by a minion when it is activated independant of a command from the master. (Relevant code: https://github.com/saltstack/salt/blob/develop/salt/returners/mysql.py#L294)

The way this magic job-id is handled by the returner seems to be that when it is seen, to replace it with a new job id generated from returner.prep_jid. However, it appears that only the mysql and local_cache returners handle this job id explicitly, and everything else does not tamper with it at all. Which returner is handling it correctly?

Is this something that a returner is actually not supposed to receive and a result of a bug or misconfiguration? Most of the logic that messes with this magic job-id immediately replaces it with a call to prep_jid, however there are a few cases where this logic is missed which results in a returner receiving this 'req' job...

Setup

To view the job-id in use, simply set the startup_states in a minion to execute some state that takes a while (so that you can race against the state being applied). Configure any of the non-local returners so that you can manually check the job being created.

Steps to Reproduce Issue

As the state is being applied, when you view the active jobs salt-run jobs.active you'll see that the job id for the state that was started by the minion has the job-id of "req".

Versions Report

Salt Version:
           Salt: 2019.2.0

Dependency Versions:
           cffi: 1.11.5
       cherrypy: Not Installed
       dateutil: 2.8.0
      docker-py: Not Installed
          gitdb: 2.0.3
      gitpython: 2.1.11
          ioflo: 1.7.5
         Jinja2: 2.10
        libgit2: 0.27.8
        libnacl: 1.6.1
       M2Crypto: Not Installed
           Mako: 1.0.7
   msgpack-pure: Not Installed
 msgpack-python: 0.5.6
   mysql-python: Not Installed
      pycparser: 2.14
       pycrypto: 2.6.1
   pycryptodome: 3.7.0
         pygit2: 0.27.4
         Python: 2.7.15 (default, Oct 15 2018, 15:26:09)
   python-gnupg: Not Installed
         PyYAML: 4.2
          PyZMQ: 17.0.0
           RAET: 0.6.8
          smmap: 2.0.3
        timelib: Not Installed
        Tornado: 5.0.2
            ZMQ: 4.1.6

System Versions:
           dist: fedora 29 Twenty Nine
         locale: ANSI_X3.4-1968
        machine: x86_64
        release: 4.14.96-coreos
         system: Linux
        version: Fedora 29 Twenty Nine

The text was updated successfully, but these errors were encountered:

arizvisa · 2019-02-26T00:46:38Z

It looks like the reason why returners might get req as the job instead of a real job id is because of the following code. But again, I'm really unfamiliar with this beastly code-base so I'm not 100% sure:

https://github.com/saltstack/salt/blob/develop/salt/daemons/masterapi.py#L780

    def _return(self, load):
        '''
        Handle the return data sent from the minions
        '''
...
        if load['jid'] == 'req':
            # The minion is returning a standalone job, request a jobid
            prep_fstr = '{0}.prep_jid'.format(self.opts['master_job_cache'])
            load['jid'] = self.mminion.returners[prep_fstr](nocache=load.get('nocache', False))

            # save the load, since we don't have it
            saveload_fstr = '{0}.save_load'.format(self.opts['master_job_cache'])
            self.mminion.returners[saveload_fstr](load['jid'], load)
...
        if not self.opts['job_cache'] or self.opts.get('ext_job_cache'):    # XXX: is this right?
            return

        fstr = '{0}.update_endtime'.format(self.opts['master_job_cache'])
        if (self.opts.get('job_cache_store_endtime')
                and fstr in self.mminion.returners):
            self.mminion.returners[fstr](load['jid'], endtime)

        fstr = '{0}.returner'.format(self.opts['master_job_cache'])
self.mminion.returners[fstr](load)

You can see the conditional at XXX where it's checking to see if the caches don't exist, however it looks like the test for both job_cache and ext_job_cache intended to be wrapped in parentheses because this line is essentially saying that if there's _not_ a job cache or there _is_ an external job cache, then ignore the following code which calls the returner() function and uses the master_job_cache.

Is this intentional, or was it really supposed to be the following which results in the returner being called if there's an external job cache configured?

if not (self.opts['job_cache'] or self.opts.get('ext_job_cache')):

Ch3LL · 2019-02-26T15:16:55Z

it looks like adding req to the jid was on purpose. If you can see this PR: #16392

if you see this comment: #16392 (comment) looks like we need to add that code handling the req return to all the returners.

arizvisa · 2019-02-27T01:41:12Z

Ok. Because yeah, if that explicit check doesn't exist then it results in the load data being overwritten since all jids spawned from a minion are 'req'. I'll update my PR then.

Thanks!

arizvisa · 2019-02-27T01:43:11Z

Ftr, that unfortunately means that all the returners (other than mysql and local_cache) are busted. Maybe not totally busted actually since there's no error (just silence), but essentially any jobs spawned by minions will cease to exist..

Should I create a new issue pointing that out, or perhaps re-open this and change the title?

Ch3LL · 2019-03-04T15:23:12Z

lets re-title this and track here :)

arizvisa · 2019-03-06T02:36:11Z

Okay, cool. Re-titled it.

As another question, is there some way to see the 'req' jobs that are currently being run in a salt environment? Although returners should re-map the 'req' to a new job id, I've been unable to figure out how to view that new job id without actually watching the event queue.. If we go to the prior logic where 'req' is not re-mapped, then you can view the job...but if more than one minion makes a 'req' then the previous results are overwritten.

arizvisa · 2019-03-08T01:31:46Z

@Ch3LL, do you mind re-tagging this since it isn't a question anymore?

arizvisa · 2019-03-08T01:35:03Z

Ftr, PR #51363 fixes this issue just for the etcd returner. So once/if that gets merged there's one less returner to fix.

Ch3LL · 2019-03-12T15:24:36Z

if we go to the prior logic where 'req' is not re-mapped, then you can view the job

How were you previously viewing it when it was working?

arizvisa · 2019-03-18T07:34:46Z

By using 'req' as the job id. However anytime another client started another job, it would overwrite the previous job data for 'req'.

Ch3LL · 2019-03-20T15:59:11Z

when you say overwrite do you mean its overwriting the req job in etcd where the job returns are stored?

arizvisa · 2019-03-20T18:41:50Z

yes. it's ovewriting because the job-id is 'req', so any returner that lets you use a non-integral job id will be overwritten if the job id is being used as a unique key. offhand i think cassandra does this, but there's probably others.

arizvisa · 2019-03-20T18:54:26Z

what might work as a workaround would be to introduce minion-specific job ids. This way job ids can be unique which corresponds to the assumptions that the sql-based returners make when creating an index for the job id.

Something like req-${MINION_ID}-${TIMESTAMP} or something similar since it seems that once a job id hits the returner, salt doesn't seem to care about it anymore (not sure about this). But this way the job id is still searchable via returner.list_jobs (and friends).

This'll definitely need some discussion though as I'm really only familiar with the returner interfaces from my work on etcd. Pretty much the most complete returners are the sql-based one, the local_cache one, the write-only ones (email-based and such), and the etcd one (afaict, and whenever it gets merged).

Ch3LL · 2019-03-21T15:15:45Z

thanks for clarifying. wanted to make sure i was following correctly.

ping @saltstack/team-core any ideas on how we could approach this?

waynew · 2019-04-30T22:47:54Z

Apparently I need to dig into returner code, because it's a place I'm not familiar with at all right now 😛

arizvisa · 2019-05-07T02:26:03Z

@waynew, if you want to check my PR for etcd support in the returners. I'd be welling to bet that it's the most complete and documented of the returner implementations.

#51363

Here's a link to the specific file in my submitted branch: https://github.com/arizvisa/saltstack-salt/blob/etcd-returner-is-busted/salt/returners/etcd_return.py

arizvisa · 2019-05-07T02:34:55Z

If you use the etcd returner PR, it's very easy to monitor what a client is doing as etcd simplifies monitoring its schema via etcdclient. You can monitor for "/$jid/$id/$num" (or similar, i haven't messed with this in a long time) where your jid is "req" or a result from "prep_jid", and your "num" is an incrementing number for the state that the client is processing.

Otherwise you'll need to use state.event to watch what's happening as jobs get returned and keep an eye for "$jid" which can be "req" or a jid depending on the issue i mentioned.

Or you can also just plant logs in the returner that you're using (non-local ones).

stale · 2020-01-08T17:10:21Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

arizvisa · 2020-01-08T22:24:13Z

This is referenced by #52775 and seems to be the only mention of the special handling of the "req" jid by the returners. Each returner has to explicitly handle this job and so this should probably be left open.

Relevent documentation for how the etcd2 returner deals with this special case can be found in the documentation for the salt.returner.etcd_return module merged by PR #51363.

stale · 2020-01-08T22:24:15Z

Thank you for updating this issue. It is no longer marked as stale.

Ch3LL added Bug broken, incorrect, or confusing behavior Question The issue is more of a question rather than a bug or a feature request and removed Bug broken, incorrect, or confusing behavior labels Feb 26, 2019

Ch3LL added this to the Approved milestone Feb 26, 2019

arizvisa closed this as completed Feb 27, 2019

Ch3LL reopened this Mar 4, 2019

arizvisa changed the title ~~[Question] the "req" job id and how it relates to returners~~ All of the returners (excluding mysql and local_cache) are not handling the "req" job id correctly Mar 6, 2019

arizvisa mentioned this issue Mar 8, 2019

The etcd returner is using the wrong API due to a refactor of salt.utils.etcd_util. #51363

Merged

Ch3LL added Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around P3 Priority 3 triage and removed Question The issue is more of a question rather than a bug or a feature request labels Mar 12, 2019

max-arnold mentioned this issue May 1, 2019

Conformance tests #52775

Open

stale bot added the stale label Jan 8, 2020

stale bot removed the stale label Jan 8, 2020

waynew added the Confirmed Salt engineer has confirmed bug/feature - often including a MCVE label Feb 27, 2020

sagetherage removed the triage label Apr 21, 2020

sagetherage removed the P3 Priority 3 label Jun 3, 2020

arizvisa mentioned this issue Dec 17, 2021

Implemented the update_endtime and get_endtime functions for the etcd_returner from PR #51363. #55186

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All of the returners (excluding mysql and local_cache) are not handling the "req" job id correctly #51809

All of the returners (excluding mysql and local_cache) are not handling the "req" job id correctly #51809

arizvisa commented Feb 25, 2019

arizvisa commented Feb 26, 2019 •

edited

Loading

Ch3LL commented Feb 26, 2019

arizvisa commented Feb 27, 2019

arizvisa commented Feb 27, 2019 •

edited

Loading

Ch3LL commented Mar 4, 2019

arizvisa commented Mar 6, 2019

arizvisa commented Mar 8, 2019 •

edited

Loading

arizvisa commented Mar 8, 2019

Ch3LL commented Mar 12, 2019

arizvisa commented Mar 18, 2019

Ch3LL commented Mar 20, 2019

arizvisa commented Mar 20, 2019

arizvisa commented Mar 20, 2019

Ch3LL commented Mar 21, 2019

waynew commented Apr 30, 2019

arizvisa commented May 7, 2019

arizvisa commented May 7, 2019 •

edited

Loading

stale bot commented Jan 8, 2020

arizvisa commented Jan 8, 2020

stale bot commented Jan 8, 2020

All of the returners (excluding mysql and local_cache) are not handling the "req" job id correctly #51809

All of the returners (excluding mysql and local_cache) are not handling the "req" job id correctly #51809

Comments

arizvisa commented Feb 25, 2019

Description of Issue/Question

Setup

Steps to Reproduce Issue

Versions Report

arizvisa commented Feb 26, 2019 • edited Loading

Ch3LL commented Feb 26, 2019

arizvisa commented Feb 27, 2019

arizvisa commented Feb 27, 2019 • edited Loading

Ch3LL commented Mar 4, 2019

arizvisa commented Mar 6, 2019

arizvisa commented Mar 8, 2019 • edited Loading

arizvisa commented Mar 8, 2019

Ch3LL commented Mar 12, 2019

arizvisa commented Mar 18, 2019

Ch3LL commented Mar 20, 2019

arizvisa commented Mar 20, 2019

arizvisa commented Mar 20, 2019

Ch3LL commented Mar 21, 2019

waynew commented Apr 30, 2019

arizvisa commented May 7, 2019

arizvisa commented May 7, 2019 • edited Loading

stale bot commented Jan 8, 2020

arizvisa commented Jan 8, 2020

stale bot commented Jan 8, 2020

arizvisa commented Feb 26, 2019 •

edited

Loading

arizvisa commented Feb 27, 2019 •

edited

Loading

arizvisa commented Mar 8, 2019 •

edited

Loading

arizvisa commented May 7, 2019 •

edited

Loading