You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, we have reports that this bug has been seen in real life in OCP deployments. Ping @CFSNM , the reporter.
Expected results
A user cancel should always yield a simple "canceled" status.
Actual results
Additional information
To describe what happens...
In the request, a message is sent to the dispatcher to SIGTERM the controller process. The process receives that, stops processing data, does a receptorctl cancel of the work unit, then it checks the cancel_flag to determine if it should use the above message, or just a "cancelled" status.
Meanwhile, the cancel_flag will only change to True on commit of the request, since it operates in a transaction (the job does not). So this leaves open the possibility to get this message when the request processing is slow and/or at the same time, the dispatcher and worker respond very fast.
This is academically pretty well established, so we should really just close the loop hole. This is not impossible. Some ideas I have...
Change the control-and-reply pattern for canceling to just a control signal with no reply. Then do not use the new_connection parameter when submitting the task. Thus, the message is submitted in the same transaction as the cancel_flag is flipped. This may require moving some corner-case logic out into an on_commit method or a dispatcher task to handle the cases where the status should flip directly to "cancelled".
Make this view non-atomic. Then we can do very minor re-arrangement to set the cancel_flag before we submit the task. If the view is no longer atomic, there will be no problem with this. We also won't need the new_connection flag in this case either.
The text was updated successfully, but these errors were encountered:
This issue directly came from an integration test, so the lack of the original failure may be evidence to close
Technically, I don't see a strong reason to prioritize this for backports, so I'm not filing any issues to do that work, but I would not argue against it.
Please confirm the following
Bug Summary
In some conditions, jobs will incorrectly have the following message applied to them:
After the user does a POST to
/api/v2/jobs/N/cancel/
.this appears to be fallout from recent development, where we landed fixes for some other things, and this was fallout from those fixes.
AWX version
devel
Select the relevant components
Installation method
openshift
Modifications
no
Ansible version
devel
Operating system
N/A
Web browser
Chrome
Steps to reproduce
Right now I don't have concrete steps to reproduce, other than to use this hack:
However, we have reports that this bug has been seen in real life in OCP deployments. Ping @CFSNM , the reporter.
Expected results
A user cancel should always yield a simple "canceled" status.
Actual results
Additional information
To describe what happens...
In the request, a message is sent to the dispatcher to SIGTERM the controller process. The process receives that, stops processing data, does a
receptorctl
cancel of the work unit, then it checks thecancel_flag
to determine if it should use the above message, or just a "cancelled" status.Meanwhile, the
cancel_flag
will only change to True on commit of the request, since it operates in a transaction (the job does not). So this leaves open the possibility to get this message when the request processing is slow and/or at the same time, the dispatcher and worker respond very fast.This is academically pretty well established, so we should really just close the loop hole. This is not impossible. Some ideas I have...
new_connection
parameter when submitting the task. Thus, the message is submitted in the same transaction as thecancel_flag
is flipped. This may require moving some corner-case logic out into anon_commit
method or a dispatcher task to handle the cases where the status should flip directly to "cancelled".cancel_flag
before we submit the task. If the view is no longer atomic, there will be no problem with this. We also won't need thenew_connection
flag in this case either.The text was updated successfully, but these errors were encountered: