-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Path to non-existing key on GCS fails to be handled as optional input #6276
Comments
@tymokvo Is this issue happening consistently? |
@sarabala1979 thanks for the quick reply! I updated the issue with a failing workflow using GCS and, for comparison, one that succeeds using HTTP. Init container logs: time="2021-07-07T01:40:57.165Z" level=info msg="Starting Workflow Executor" executorType=emissary version=v3.1.0-rc14
time="2021-07-07T01:40:57.169Z" level=info msg="Executor initialized" includeScriptOutput=false namespace=argo podName=gcs-test-4gtk4 template="{\"name\":\"gcs-test\",\"inputs\":{\"artifacts\":[{\"name\":\"my-art\",\"path\":\"/my-artifact\",\"gcs\":{\"bucket\":\"pollination-public\",\"serviceAccountKeySecret\":{\"name\":\"gcs-creds\",\"key\":\"serviceAccountKey\"},\"key\":\"blobs/argo-test/not-exists.txt\"},\"optional\":true}]},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"ubuntu:latest\",\"command\":[\"sh\",\"-c\"],\"args\":[\"echo 'no artifact :('\"],\"resources\":{}},\"archiveLocation\":{\"archiveLogs\":true,\"gcs\":{\"bucket\":\"pollination-server-staging\",\"serviceAccountKeySecret\":{\"name\":\"gcs-creds\",\"key\":\"serviceAccountKey\"},\"key\":\"gcs-test-4gtk4/gcs-test-4gtk4\"}}}" version="&Version{Version:v3.1.0-rc14,BuildDate:2021-06-10T18:04:46Z,GitCommit:d385e6107ab8d4ea4826bd6972608f8fbc86fbe5,GitTag:v3.1.0-rc14,GitTreeState:clean,GoVersion:go1.15.7,Compiler:gc,Platform:linux/amd64,}"
time="2021-07-07T01:40:57.235Z" level=info msg="Start loading input artifacts..."
time="2021-07-07T01:40:57.235Z" level=info msg="Downloading artifact: my-art"
time="2021-07-07T01:40:57.235Z" level=info msg="GCS Load path: /argo/inputs/artifacts/my-art.tmp, key: blobs/argo-test/not-exists.txt"
time="2021-07-07T01:40:57.375Z" level=info msg="Detecting if /argo/inputs/artifacts/my-art.tmp is a tarball"
time="2021-07-07T01:40:57.375Z" level=error msg="executor error: open /argo/inputs/artifacts/my-art.tmp: no such file or directory"
time="2021-07-07T01:40:57.375Z" level=info msg="Alloc=6978 TotalAlloc=15938 Sys=73553 NumGC=5 Goroutines=6"
time="2021-07-07T01:40:57.375Z" level=fatal msg="open /argo/inputs/artifacts/my-art.tmp: no such file or directory" Wait container logs: Error from server (BadRequest): container "wait" in pod "gcs-test-4gtk4" is waiting to start: PodInitializing Controller logs: time="2021-07-07T01:40:55.860Z" level=info msg="Processing workflow" namespace=argo workflow=gcs-test-4gtk4
time="2021-07-07T01:40:55.864Z" level=info msg="Get configmaps 200"
time="2021-07-07T01:40:55.864Z" level=info msg="resolved artifact repository" artifactRepositoryRef="argo/#"
time="2021-07-07T01:40:55.866Z" level=info msg="Get configmaps 200"
time="2021-07-07T01:40:55.867Z" level=info msg="Updated phase -> Running" namespace=argo workflow=gcs-test-4gtk4
time="2021-07-07T01:40:55.867Z" level=info msg="Pod node gcs-test-4gtk4 initialized Pending" namespace=argo workflow=gcs-test-4gtk4
time="2021-07-07T01:40:55.872Z" level=info msg="Create events 201"
time="2021-07-07T01:40:55.882Z" level=info msg="Create pods 201"
time="2021-07-07T01:40:55.883Z" level=info msg="Created pod: gcs-test-4gtk4 (gcs-test-4gtk4)" namespace=argo workflow=gcs-test-4gtk4
time="2021-07-07T01:40:55.889Z" level=info msg="Update workflows 200"
time="2021-07-07T01:40:55.890Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=61169066 workflow=gcs-test-4gtk4
time="2021-07-07T01:40:56.188Z" level=info msg="Watch workflows 200"
time="2021-07-07T01:40:58.742Z" level=info msg="Get leases 200"
time="2021-07-07T01:40:58.745Z" level=info msg="Update leases 200"
time="2021-07-07T01:41:03.749Z" level=info msg="Get leases 200"
time="2021-07-07T01:41:03.752Z" level=info msg="Update leases 200"
time="2021-07-07T01:41:05.884Z" level=info msg="Processing workflow" namespace=argo workflow=gcs-test-4gtk4
time="2021-07-07T01:41:05.887Z" level=info msg="Get configmaps 200"
time="2021-07-07T01:41:05.888Z" level=info msg="Pod failed: Error (exit code 1): open /argo/inputs/artifacts/my-art.tmp: no such file or directory" displayName=gcs-test-4gtk4 namespace=argo pod=gcs-test-4gtk4 templateName=gcs-test workflow=gcs-test-4gtk4
time="2021-07-07T01:41:05.888Z" level=info msg="Updating node gcs-test-4gtk4 status Pending -> Error" namespace=argo workflow=gcs-test-4gtk4
time="2021-07-07T01:41:05.888Z" level=info msg="Updating node gcs-test-4gtk4 message: Error (exit code 1): open /argo/inputs/artifacts/my-art.tmp: no such file or directory" namespace=argo workflow=gcs-test-4gtk4
time="2021-07-07T01:41:05.890Z" level=info msg="Updated phase Running -> Error" namespace=argo workflow=gcs-test-4gtk4
time="2021-07-07T01:41:05.890Z" level=info msg="Updated message -> Error (exit code 1): open /argo/inputs/artifacts/my-art.tmp: no such file or directory" namespace=argo workflow=gcs-test-4gtk4
time="2021-07-07T01:41:05.890Z" level=info msg="Marking workflow completed" namespace=argo workflow=gcs-test-4gtk4
time="2021-07-07T01:41:05.890Z" level=info msg="Marking workflow as pending archiving" namespace=argo workflow=gcs-test-4gtk4
time="2021-07-07T01:41:05.890Z" level=info msg="Checking daemoned children of " namespace=argo workflow=gcs-test-4gtk4
time="2021-07-07T01:41:05.897Z" level=info msg="Create events 201"
time="2021-07-07T01:41:05.899Z" level=info msg="Update workflows 200"
time="2021-07-07T01:41:05.900Z" level=info msg="Workflow update successful" namespace=argo phase=Error resourceVersion=61169135 workflow=gcs-test-4gtk4
time="2021-07-07T01:41:05.902Z" level=info msg="archiving workflow" namespace=argo uid=ee1bbe7a-5b72-46b2-b1b2-6803b2049533 workflow=gcs-test-4gtk4
time="2021-07-07T01:41:05.904Z" level=info msg="Create events 201"
time="2021-07-07T01:41:05.907Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/gcs-test-4gtk4/labelPodCompleted
time="2021-07-07T01:41:05.908Z" level=info msg="Create events 201"
time="2021-07-07T01:41:05.916Z" level=info msg="Patch pods 200"
time="2021-07-07T01:41:05.929Z" level=info msg="Patch workflows 200"
time="2021-07-07T01:41:05.930Z" level=info msg="archiving workflow" namespace=argo uid=ee1bbe7a-5b72-46b2-b1b2-6803b2049533 workflow=gcs-test-4gtk4 |
@sarabala1979 I have patched our fork of argo with the change that I proposed here and can confirm that it solves the issue. Should I open a PR or is there more info I can provide? |
Maybe fixed by #6393. Re-open if not. |
Summary
Preamble
What happened/what you expected to happen?
Before we start, around 2/3 of issues can be fixed by one of the following:
Yep, other downloaded artifacts are fine.
Nope, but the code in question hasn't been touched for 16 months.
Nope, but this doesn't seem relevant as I believe the cause is the GCS storage client failing to make a file or raise an error.
Description
I expected an optional input to not cause an error when it does not exist on the local filesystem.
What happened was, the optional input caused:
executor error: rename /argo/inputs/artifacts/ambient-cache.tmp /argo/inputs/artifacts/ambient-cache: no such file or directory
.When using Google Cloud Storage as an artifact repository, passing optional outputs/inputs between steps may cause an unexpected failure in the case that an optional output was not created in one step, and thus the key doesn't exist on GCS, but it is attempted to be passed as an optional input to a subsequent step. In the case that the key doesn't exist, an empty array skips a loop execution and fails to raise/catch an error for a missing optional artifact.
In this line, the
errors.CodeNotFound
is not raised because thelistByPrefix
method called byartDriver.Load
returns an empty list and skips any other operations on the file.I pulled out the methods from the
argo-workflows/workflow/artifacts/gcs
package into a gist here to demonstrate the problem. ThelistByPrefix
method may return an empty list from the GCS API, which means that the loop in whichdownloadObject
is called never executes. Thus, none of the file I/O errors are ever raised from insidedownloadObject
. The line in question is here.It seems like the simplest fix would be to raise an error in the case of an empty array from
listByPrefix
that will be sent up the stack to be handled by the executor.Diagnostics
👀 Yes! We need all of your diagnostics, please make sure you add it all, otherwise we'll go around in circles asking you for it:
What Kubernetes provider are you using?
GKE
What version of Argo Workflows are you running?
v3.1.0-rc14
What executor are you running? Docker/K8SAPI/Kubelet/PNS/Emissary
Emissary
Did this work in a previous version? I.e. is it a regression?
Unknown.
Are you pasting thousands of log lines? That's too much information.
Nope
Workflow
Our workflows that experience this have hundreds to thousands of lines, and seem like they would be more distracting than helpful here. I believe this issue is specific to the GCS storage adapter rather than the workflow itself. I'm happy to try and craft a minimal workflow if that does seem useful though. The gist that is linked above reproduces the failure case.
Failing Workflow
Working Workflow
Misc.
Thank you for working on argo! Overall, it has been really great to work with. 😸
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: