Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consume job_explanation from runner, fix error reporting error #13482

Merged
merged 1 commit into from
Aug 30, 2023

Conversation

AlanCoding
Copy link
Member

@AlanCoding AlanCoding commented Jan 27, 2023

SUMMARY

An update of #12089, as this failure in error handling has been reflected in recent cases, and in user reports. This seems to account for the "Job terminated due to error" reports.

This modifies some of what was introduced in #12961, which is the source of the bug.

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
COMPONENT NAME
  • API

@AlanCoding
Copy link
Member Author

This should go in along-side of ansible/ansible-runner#1186

@AlanCoding
Copy link
Member Author

What I originally tested this against was the runner change:

@@ -68,6 +73,7 @@ class Worker(object):
             _output = sys.stdout.buffer
         self._input = _input
         self._output = _output
+        self._output.write(b'surprise from alan!\n')
 
         self.kwargs = kwargs
         self.job_kwargs = None

But I've realized there is a shortcoming for a different class of errors. Consider:

diff --git a/ansible_runner/streaming.py b/ansible_runner/streaming.py
index 608b972..df14bfc 100644
--- a/ansible_runner/streaming.py
+++ b/ansible_runner/streaming.py
@@ -68,6 +68,7 @@ class Worker(object):
             _output = sys.stdout.buffer
         self._input = _input
         self._output = _output
+        raise Exception('alan!')
 
         self.kwargs = kwargs
         self.job_kwargs = None

This results in a bad user experience:

Screenshot from 2023-01-31 13-48-16

But! As per some speculative data related to #13469, we know that receptor may not be fully functional. So we might need the readline data in some cases, and the full results (or truncated) results in other cases.

As I've stewed on this problem, I don't see a way out other than a special-case for the JSON parse error from the processing code. So I think I have an idea of how to move forward - which involves combining all of the data we can possibly manage, and a special string comparison for that specific message.

@AlanCoding
Copy link
Member Author

Error handling is still very broken without #13494 fixed, so I'm not sure what I'm going to do now.

@AlanCoding
Copy link
Member Author

But I did adjust my ansible-runner PR to gracefully handle the Exception case I mentioned before. I think both of these are good changes, but it's still hard to get quality output from error cases.

@@ -437,9 +437,9 @@ def _run_internal(self, receptor_ctl):
lines = resultfile.readlines()
receptor_output = b"".join(lines).decode()
if receptor_output:
self.task.runner_callback.delay_update(result_traceback=receptor_output)
self.task.runner_callback.delay_update(result_traceback=f'Worker output:\n{receptor_output}')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan here, now, is that ansible-runner will write the line to job_explanation. By doing that, we might echo the output twice in two places. Because of that, we need some better labeling of what data we're showing, so I added this stuff.

@@ -437,9 +437,9 @@ def _run_internal(self, receptor_ctl):
lines = resultfile.readlines()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recent discussion is that I might delete resultsock.setblocking(False)

@AlanCoding
Copy link
Member Author

I added tests to my related ansible-runner PR. This is one of the highest priority items I have to make sure we get in, but I don't want to merge it until the runner change lands (because don't know what the UX of that would be).

@AlanCoding AlanCoding force-pushed the non_json_error branch 2 times, most recently from a001a96 to 45c3244 Compare April 3, 2023 15:22
@AlanCoding
Copy link
Member Author

I have test cases up here:

https://github.com/AlanCoding/bad-execution-environments

Let me review those...

The "traceback" test is not tested right now due to linked receptor issue ansible/receptor#736

ending_line

Screenshot from 2023-04-03 12-29-19

starting_line

Screenshot from 2023-04-03 12-29-09

artifacts

Screenshot from 2023-04-03 12-26-59

@AlanCoding AlanCoding self-assigned this Apr 13, 2023
@AlanCoding AlanCoding changed the title Consume job_explanation when given by ansible-runner Consume job_explanation from runner, fix error reporting error Jun 6, 2023
@AlanCoding AlanCoding changed the title Consume job_explanation from runner, fix error reporting error Consume job_explanation from runner, fix error reporting error Jun 6, 2023
Add extra formatting to error messages for clarity

Fix nonblocking related tracebacks, add logging
@AlanCoding AlanCoding merged commit cdb4f0b into ansible:devel Aug 30, 2023
djyasin pushed a commit to djyasin/awx that referenced this pull request Sep 16, 2024
djyasin pushed a commit to djyasin/awx that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants