-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
containers in containersets not appropriately reporting status if terminated #8545
Comments
I think this might be fixed by #8478. Could you please try |
Hi @alexec, thanks for the swift reply. :D I had a colleague run the workflow on a fresh argo set up using this manifest and the problem seems to persist. |
You need to upgrade the workflow-controller. Can you double check? It's pretty easy to apply the wrong manifests. |
I can double check but it won't be until next week, sorry! I have attached the workflow that reproduces the error if you'd like to run it beforehand though. |
thanks @cwood-uk |
This issue is missing workflow YAML to run locally. |
@alexec The YAML is the reproducible-workflow.txt attached in the issue, Github doesn't allow the upload of YAML files. Have attached here again. :) |
Thank you. Can repro locally. |
Restarting the controller does not fix this. |
Pods do complete correctly. The workflow status is not being updated. This points to a problem it the controller. E.g. in operator.go |
Workflow stops reconciling, but is not labelled completed. |
Hypothesis: Pods are being labelled as complete, before the node status has been update. |
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: masscan-dummy-scan
spec:
entrypoint: run-plugin
activeDeadlineSeconds: 20
arguments:
parameters:
- name: plugin
value: "masscan"
templates:
- name: run-plugin
dag:
tasks:
- name: shard-1
template: run-masscan
- name: shard-2
template: run-masscan
- name: shard-3
template: run-masscan
- name: shard-4
template: run-masscan
- name: shard-5
template: run-masscan
- name: shard-6
template: run-masscan
- name: shard-7
template: run-masscan
- name: shard-8
template: run-masscan
- name: shard-9
template: run-masscan
- name: shard-10
template: run-masscan
- name: run-masscan
containerSet:
containers:
- name: a
image: "debian:9.5-slim"
command:
- sleep
args:
- "10"
- name: b
image: "debian:9.5-slim"
command:
- sleep
args:
- "80"
dependencies:
- a
- name: c
image: "debian:9.5-slim"
command:
- sleep
args:
- "30"
dependencies:
- b Simpler. Faster. |
This is caused by |
Signed-off-by: Alex Collins <[email protected]>
Thanks for all your work on this, let me know if you need any help from us. |
Signed-off-by: Alex Collins <[email protected]>
Hey @alexec, I noticed this was cherry picked for 3.3.6 so I upgraded, but I still seem to be getting the same issues where the workflow is still running after 'termination'. Am I right in thinking this was fixed in 3.3.6 or have I jumped the gun? |
Hey @sarabala1979, can you confirm whether this has been merged into the :latest tag? Right now I am still getting the same errors. |
@the1schwartz @alexec I can also still reproduce this in 3.4.0 |
fyi @alexec |
Checklist
Summary
What happened/what you expected to happen?
When terminating a workflow due to a deadline, I expect that the workflow is terminated, yet we are still waiting for containerset containers to be terminated, even though they are finished with error in k8s.
An image of the workflow after timeout termination:

An image of the containers for shard-13 in k8s:

What version are you running?
3.3.2
Reproducible Workflow
reproducible-workflow.txt
Logs from the workflow controller:
controller-logs.txt
The workflow's pods that are problematic:
Logs from in your workflow's wait container:
wait-logs.txt
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: