Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAG workflow does not finish #6209

Closed
dcd000 opened this issue Jun 24, 2021 · 5 comments
Closed

DAG workflow does not finish #6209

dcd000 opened this issue Jun 24, 2021 · 5 comments
Labels
type/bug type/regression Regression from previous behavior (a specific type of bug)
Milestone

Comments

@dcd000
Copy link

dcd000 commented Jun 24, 2021

Summary

Workflow does not finish after all nodes get completed

Workflow sample:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: dag-nested-
spec:
  entrypoint: diamond
  templates:
  - name: echo
    inputs:
      parameters:
      - name: message
    container:
      image: alpine:3.7
      command: [echo, "{{inputs.parameters.message}}"]
  - name: diamond
    dag:
      tasks:
      - name: rootstep
        template: echo
        arguments:
          parameters: [{name: message, value: rootstep}]
      - name: dag2
        dependencies: [rootstep]
        template: nested-diamond
        arguments:
          parameters: [{name: message, value: dag2}]
      - name: simplestep
        dependencies: [rootstep]
        template: echo
        arguments:
          parameters: [{name: message, value: simplestep}]

  - name: nested-diamond
    inputs:
      parameters:
      - name: message
    dag:
      tasks:
      - name: task1
        template: echo
        arguments:
          parameters: [{name: message, value: "{{inputs.parameters.message}}-task1"}]

What happened/what you expected to happen?

What happened: Workflow does not finish after all nodes get completed
image

What you expected to happen: Workflow finishes after all nodes get completed
image

Diagnostics

👀 Yes! We need all of your diagnostics, please make sure you add it all, otherwise we'll go around in circles asking you for it:

What Kubernetes provider are you using?
Azure AKS 1.19.11

What version of Argo Workflows are you running?
3.1.0

What executor are you running? Docker/K8SAPI/Kubelet/PNS/Emissary
PNS

Did this work in a previous version? I.e. is it a regression?
Yes. It works in 3.0.8 and 2.12.11

Are you pasting thousands of log lines? That's too much information.
no

# Either a workflow that reproduces the bug, or paste you whole workflow YAML, including status, something like:
kubectl get wf -o yaml ${workflow}

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  creationTimestamp: "2021-06-24T11:52:55Z"
  generateName: simple-dag-
  generation: 4
  labels:
    workflows.argoproj.io/phase: Running
  managedFields:
  - apiVersion: argoproj.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:generateName: {}
      f:spec:
        .: {}
        f:arguments: {}
        f:entrypoint: {}
      f:status:
        .: {}
        f:finishedAt: {}
    manager: argo
    operation: Update
    time: "2021-06-24T11:52:55Z"
  - apiVersion: argoproj.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .: {}
          f:workflows.argoproj.io/phase: {}
      f:spec:
        f:activeDeadlineSeconds: {}
        f:nodeSelector:
          .: {}
          f:agentpool: {}
        f:securityContext:
          .: {}
          f:runAsUser: {}
        f:serviceAccountName: {}
        f:templates: {}
        f:tolerations: {}
        f:ttlStrategy:
          .: {}
          f:secondsAfterCompletion: {}
        f:volumes: {}
      f:status:
        f:artifactRepositoryRef:
          .: {}
          f:default: {}
        f:conditions: {}
        f:nodes:
          .: {}
          f:simple-dag-jj6hr:
            .: {}
            f:children: {}
            f:displayName: {}
            f:finishedAt: {}
            f:id: {}
            f:name: {}
            f:phase: {}
            f:progress: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:simple-dag-jj6hr-861708413:
            .: {}
            f:boundaryID: {}
            f:children: {}
            f:displayName: {}
            f:finishedAt: {}
            f:id: {}
            f:inputs:
              .: {}
              f:parameters: {}
            f:name: {}
            f:outputs:
              .: {}
              f:artifacts: {}
              f:exitCode: {}
            f:phase: {}
            f:progress: {}
            f:resourcesDuration:
              .: {}
              f:cpu: {}
              f:memory: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:simple-dag-jj6hr-1621329225:
            .: {}
            f:boundaryID: {}
            f:children: {}
            f:displayName: {}
            f:finishedAt: {}
            f:id: {}
            f:inputs:
              .: {}
              f:parameters: {}
            f:name: {}
            f:outboundNodes: {}
            f:phase: {}
            f:progress: {}
            f:resourcesDuration:
              .: {}
              f:cpu: {}
              f:memory: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:simple-dag-jj6hr-1983413579:
            .: {}
            f:boundaryID: {}
            f:displayName: {}
            f:finishedAt: {}
            f:id: {}
            f:inputs:
              .: {}
              f:parameters: {}
            f:name: {}
            f:outputs:
              .: {}
              f:artifacts: {}
              f:exitCode: {}
            f:phase: {}
            f:progress: {}
            f:resourcesDuration:
              .: {}
              f:cpu: {}
              f:memory: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
          f:simple-dag-jj6hr-2043091527:
            .: {}
            f:boundaryID: {}
            f:displayName: {}
            f:finishedAt: {}
            f:id: {}
            f:inputs:
              .: {}
              f:parameters: {}
            f:name: {}
            f:outputs:
              .: {}
              f:artifacts: {}
              f:exitCode: {}
            f:phase: {}
            f:progress: {}
            f:resourcesDuration:
              .: {}
              f:cpu: {}
              f:memory: {}
            f:startedAt: {}
            f:templateName: {}
            f:templateScope: {}
            f:type: {}
        f:phase: {}
        f:progress: {}
        f:resourcesDuration:
          .: {}
          f:cpu: {}
          f:memory: {}
        f:startedAt: {}
    manager: workflow-controller
    operation: Update
    time: "2021-06-24T11:53:15Z"
  name: simple-dag-jj6hr
  namespace: argo-test
  resourceVersion: "166692983"
  selfLink: /apis/argoproj.io/v1alpha1/namespaces/argo-test/workflows/simple-dag-jj6hr
  uid: b563f745-0be0-4dbc-8d92-8bbc0f90b746
spec:
  activeDeadlineSeconds: 7200
  arguments: {}
  entrypoint: diamond
  nodeSelector:
    agentpool: argopool2
  securityContext:
    runAsUser: 0
  serviceAccountName: argo
  templates:
  - container:
      command:
      - echo
      - '{{inputs.parameters.message}}'
      image: alpine:3.7
      name: ""
      resources: {}
    inputs:
      parameters:
      - name: message
    metadata: {}
    name: echo
    outputs: {}
  - dag:
      tasks:
      - arguments:
          parameters:
          - name: message
            value: rootstep
        name: rootstep
        template: echo
      - arguments:
          parameters:
          - name: message
            value: dag2
        dependencies:
        - rootstep
        name: dag2
        template: nested-diamond
      - arguments:
          parameters:
          - name: message
            value: simplestep
        dependencies:
        - rootstep
        name: simplestep
        template: echo
    inputs: {}
    metadata: {}
    name: diamond
    outputs: {}
  - dag:
      tasks:
      - arguments:
          parameters:
          - name: message
            value: '{{inputs.parameters.message}}-task1'
        name: task1
        template: echo
    inputs:
      parameters:
      - name: message
    metadata: {}
    name: nested-diamond
    outputs: {}
  tolerations:
  - effect: NoSchedule
    key: kubernetes.azure.com/scalesetpriority
    operator: Equal
    value: spot
  - effect: NoSchedule
    key: argo
    operator: Equal
    value: dedicated
  ttlStrategy:
    secondsAfterCompletion: 86400
  volumes:
  - name: git-known-hosts
    secret:
      secretName: git-known-hosts
status:
  artifactRepositoryRef:
    default: true
  conditions:
  - status: "False"
    type: PodRunning
  finishedAt: null
  nodes:
    simple-dag-jj6hr:
      children:
      - simple-dag-jj6hr-861708413
      displayName: simple-dag-jj6hr
      finishedAt: null
      id: simple-dag-jj6hr
      name: simple-dag-jj6hr
      phase: Running
      progress: 2/2
      startedAt: "2021-06-24T11:52:55Z"
      templateName: diamond
      templateScope: local/simple-dag-jj6hr
      type: DAG
    simple-dag-jj6hr-861708413:
      boundaryID: simple-dag-jj6hr
      children:
      - simple-dag-jj6hr-1621329225
      displayName: rootstep
      finishedAt: "2021-06-24T11:53:01Z"
      id: simple-dag-jj6hr-861708413
      inputs:
        parameters:
        - name: message
          value: rootstep
      name: simple-dag-jj6hr.rootstep
      outputs:
        artifacts:
        - name: main-logs
          s3:
            key: simple-dag-jj6hr/simple-dag-jj6hr-861708413/main.log
        exitCode: "0"
      phase: Succeeded
      progress: 1/1
      resourcesDuration:
        cpu: 6
        memory: 6
      startedAt: "2021-06-24T11:52:55Z"
      templateName: echo
      templateScope: local/simple-dag-jj6hr
      type: Pod
    simple-dag-jj6hr-1621329225:
      boundaryID: simple-dag-jj6hr
      children:
      - simple-dag-jj6hr-2043091527
      displayName: dag2
      finishedAt: "2021-06-24T11:53:15Z"
      id: simple-dag-jj6hr-1621329225
      inputs:
        parameters:
        - name: message
          value: dag2
      name: simple-dag-jj6hr.dag2
      outboundNodes:
      - simple-dag-jj6hr-2043091527
      phase: Succeeded
      progress: 1/1
      resourcesDuration:
        cpu: 6
        memory: 6
      startedAt: "2021-06-24T11:53:05Z"
      templateName: nested-diamond
      templateScope: local/simple-dag-jj6hr
      type: DAG
    simple-dag-jj6hr-1983413579:
      boundaryID: simple-dag-jj6hr
      displayName: simplestep
      finishedAt: "2021-06-24T11:53:11Z"
      id: simple-dag-jj6hr-1983413579
      inputs:
        parameters:
        - name: message
          value: simplestep
      name: simple-dag-jj6hr.simplestep
      outputs:
        artifacts:
        - name: main-logs
          s3:
            key: simple-dag-jj6hr/simple-dag-jj6hr-1983413579/main.log
        exitCode: "0"
      phase: Succeeded
      progress: 1/1
      resourcesDuration:
        cpu: 6
        memory: 6
      startedAt: "2021-06-24T11:53:05Z"
      templateName: echo
      templateScope: local/simple-dag-jj6hr
      type: Pod
    simple-dag-jj6hr-2043091527:
      boundaryID: simple-dag-jj6hr-1621329225
      displayName: task1
      finishedAt: "2021-06-24T11:53:11Z"
      id: simple-dag-jj6hr-2043091527
      inputs:
        parameters:
        - name: message
          value: dag2-task1
      name: simple-dag-jj6hr.dag2.task1
      outputs:
        artifacts:
        - name: main-logs
          s3:
            key: simple-dag-jj6hr/simple-dag-jj6hr-2043091527/main.log
        exitCode: "0"
      phase: Succeeded
      progress: 1/1
      resourcesDuration:
        cpu: 6
        memory: 6
      startedAt: "2021-06-24T11:53:05Z"
      templateName: echo
      templateScope: local/simple-dag-jj6hr
      type: Pod
  phase: Running
  progress: 3/3
  resourcesDuration:
    cpu: 18
    memory: 18
  startedAt: "2021-06-24T11:52:55Z"
# Logs from the workflow controller:
kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

time="2021-06-24T11:52:55.049Z" level=info msg="Processing workflow" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:52:55.060Z" level=info msg="Updated phase  -> Running" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:52:55.060Z" level=info msg="DAG node simple-dag-jj6hr initialized Running" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:52:55.060Z" level=info msg="All of node simple-dag-jj6hr.rootstep dependencies [] completed" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:52:55.061Z" level=info msg="Pod node simple-dag-jj6hr-861708413 initialized Pending" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:52:55.081Z" level=info msg="Created pod: simple-dag-jj6hr.rootstep (simple-dag-jj6hr-861708413)" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:52:55.091Z" level=info msg="Workflow update successful" namespace=argo-test phase=Running resourceVersion=166692430 workflow=simple-dag-jj6hr
time="2021-06-24T11:53:05.050Z" level=info msg="Processing workflow" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:05.054Z" level=info msg="Updating node simple-dag-jj6hr-861708413 exit code 0" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:05.054Z" level=info msg="Setting node simple-dag-jj6hr-861708413 outputs: {\"artifacts\":[{\"name\":\"main-logs\",\"s3\":{\"key\":\"simple-dag-jj6hr/simple-dag-jj6hr-861708413/main.log\"}}]}" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:05.054Z" level=info msg="Updating node simple-dag-jj6hr-861708413 status Pending -> Succeeded" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:05.056Z" level=info msg="All of node simple-dag-jj6hr.dag2 dependencies [rootstep] completed" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:05.056Z" level=info msg="DAG node simple-dag-jj6hr-1621329225 initialized Running" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:05.057Z" level=info msg="All of node simple-dag-jj6hr.dag2.task1 dependencies [] completed" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:05.057Z" level=info msg="Pod node simple-dag-jj6hr-2043091527 initialized Pending" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:05.070Z" level=info msg="Created pod: simple-dag-jj6hr.dag2.task1 (simple-dag-jj6hr-2043091527)" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:05.070Z" level=info msg="All of node simple-dag-jj6hr.simplestep dependencies [rootstep] completed" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:05.071Z" level=info msg="Pod node simple-dag-jj6hr-1983413579 initialized Pending" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:05.085Z" level=info msg="Created pod: simple-dag-jj6hr.simplestep (simple-dag-jj6hr-1983413579)" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:05.101Z" level=info msg="Workflow update successful" namespace=argo-test phase=Running resourceVersion=166692701 workflow=simple-dag-jj6hr
time="2021-06-24T11:53:05.108Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo-test/simple-dag-jj6hr-861708413/labelPodCompleted
time="2021-06-24T11:53:15.111Z" level=info msg="Processing workflow" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:15.114Z" level=info msg="Updating node simple-dag-jj6hr-1983413579 exit code 0" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:15.114Z" level=info msg="Setting node simple-dag-jj6hr-1983413579 outputs: {\"artifacts\":[{\"name\":\"main-logs\",\"s3\":{\"key\":\"simple-dag-jj6hr/simple-dag-jj6hr-1983413579/main.log\"}}]}" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:15.114Z" level=info msg="Updating node simple-dag-jj6hr-1983413579 status Pending -> Succeeded" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:15.117Z" level=info msg="Updating node simple-dag-jj6hr-2043091527 exit code 0" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:15.117Z" level=info msg="Setting node simple-dag-jj6hr-2043091527 outputs: {\"artifacts\":[{\"name\":\"main-logs\",\"s3\":{\"key\":\"simple-dag-jj6hr/simple-dag-jj6hr-2043091527/main.log\"}}]}" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:15.117Z" level=info msg="Updating node simple-dag-jj6hr-2043091527 status Pending -> Succeeded" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:15.118Z" level=info msg="Outbound nodes of simple-dag-jj6hr-1621329225 set to [simple-dag-jj6hr-2043091527]" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:15.118Z" level=info msg="node simple-dag-jj6hr-1621329225 phase Running -> Succeeded" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:15.118Z" level=info msg="node simple-dag-jj6hr-1621329225 finished: 2021-06-24 11:53:15.118952183 +0000 UTC" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:15.119Z" level=info msg="Checking daemoned children of simple-dag-jj6hr-1621329225" namespace=argo-test workflow=simple-dag-jj6hr
time="2021-06-24T11:53:15.142Z" level=info msg="Workflow update successful" namespace=argo-test phase=Running resourceVersion=166692983 workflow=simple-dag-jj6hr
time="2021-06-24T11:53:15.182Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo-test/simple-dag-jj6hr-2043091527/labelPodCompleted
time="2021-06-24T11:53:15.186Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo-test/simple-dag-jj6hr-1983413579/labelPodCompleted
time="2021-06-24T11:53:25.199Z" level=info msg="Processing workflow" namespace=argo-test workflow=simple-dag-jj6hr

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@alexec alexec added this to the v3.1 milestone Jun 24, 2021
@alexec alexec added the type/regression Regression from previous behavior (a specific type of bug) label Jun 24, 2021
@alexec
Copy link
Contributor

alexec commented Jun 26, 2021

Maybe fixed #6193. Can you please try :latest?

@dcd000
Copy link
Author

dcd000 commented Jun 28, 2021

Hi.
Yes, It works with :latest images

@alexec
Copy link
Contributor

alexec commented Jun 28, 2021

Maybe fixed by #6193

@simster7
Copy link
Member

Hi.
Yes, It works with :latest images

Fixed?

@dcd000
Copy link
Author

dcd000 commented Jun 29, 2021

Hi.
Yes, It works with :latest images

Fixed?

Hi.
Yes, fixed in latest image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
Development

No branches or pull requests

3 participants