From 64c94657829dd3b936902a2521aee3e4675d1d0c Mon Sep 17 00:00:00 2001 From: Shane McDonald Date: Tue, 8 Nov 2022 18:35:18 -0500 Subject: [PATCH] Flush buffer in streaming interface before writing zip data We ran into a really obscure issue when working on https://github.com/ansible/receptor/pull/683. I'll try to make this at least somewhat digestable. Due to a bug in Kubernetes, AWX can't currently run jobs longer than 4 hours when deployed into Kubernetes. More context on that in https://github.com/ansible/awx/issues/11805 To address this issue, we needed a way to restart from a certain point in the logs. The only mechanism Kubernetes provides to do this is by passing "sinceTime" to the API endpoint for retrieving logs from a pod. Our patch in https://github.com/ansible/receptor/pull/683 worked when we ran it locally, but in OpenShift, jobs errored when unpacking the zip stream at the end of the results of "ansible-runner worker". Upon further investigation this was because the timestamps of the last 2 lines were exactly the same: ``` 2022-11-09T00:07:46.851687621Z {"status": "successful", "runner_ident": "1"} 2022-11-08T23:07:58.648753832Z {"zipfile": 1330} 2022-11-08T23:07:58.648753832Z UEsDBBQAAAAIAPy4aFVGnUFkqQMAAIwK.... ``` After squinting at this code for a bit I noticed that we weren't flushing the buffer here like we do in the event_handler and other callbacks that are fired in streaming.py. The end. Ugh. --- ansible_runner/utils/streaming.py | 1 + 1 file changed, 1 insertion(+) diff --git a/ansible_runner/utils/streaming.py b/ansible_runner/utils/streaming.py index 92a708fbc..72e3c5618 100644 --- a/ansible_runner/utils/streaming.py +++ b/ansible_runner/utils/streaming.py @@ -51,6 +51,7 @@ def stream_dir(source_directory, stream): else: target = stream target.write(json.dumps({"zipfile": zip_size}).encode("utf-8") + b"\n") + target.flush() with Base64IO(target) as encoded_target: for line in source: encoded_target.write(line)