Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to chdir caused PipelineRun to get stuck #1267

Closed
bobcatfish opened this issue Sep 3, 2019 · 3 comments
Closed

Failure to chdir caused PipelineRun to get stuck #1267

bobcatfish opened this issue Sep 3, 2019 · 3 comments
Assignees

Comments

@bobcatfish
Copy link
Collaborator

Actual Behavior

For #860 I tried to run a release Pipeline that included linting and it got into this state:

status:
  conditions:
  - lastTransitionTime: "2019-09-03T20:25:49Z"
    message: Not all Tasks in the Pipeline have finished executing
    reason: Running
    status: Unknown
    type: Succeeded
  startTime: "2019-09-03T20:25:49Z"
  taskRuns:
    04cdc091-ce89-11e9-b641-0a580a340481-lint-np792:
      pipelineTaskName: lint
      status:
        conditions:
        - lastTransitionTime: "2019-09-03T20:26:09Z"
          reason: Building
          status: Unknown
          type: Succeeded
        podName: 04cdc091-ce89-11e9-b641-0a580a340481-lint-np792-pod-0965af
        startTime: "2019-09-03T20:25:49Z"
        steps:
        - name: git-source-04cdc091-ce89-11e9-b641-0a580a340481-extr
          terminated:
            containerID: docker://957a78112d2a19375bdcd540475a8b6c323378a6a8fb3a0ce431b621e3b5716a
            exitCode: 0
            finishedAt: "2019-09-03T20:26:08Z"
            reason: Completed
            startedAt: "2019-09-03T20:26:01Z"
        - name: lint
          terminated:
            containerID: docker://a649db990c37e92cd71f5be91f1beb2e146e3c15d713156d9ee997e231a00023
            exitCode: 128
            finishedAt: "2019-09-03T20:26:02Z"
            message: |
              oci runtime error: container_linux.go:247: starting container process caused "chdir to cwd (\"/workspace/src/github.com/tektoncd/pipeline\") set in config.json failed: no such file or directory"
            reason: ContainerCannotRun
            startedAt: "2019-09-03T20:26:02Z"
        - name: nop
          running:
            startedAt: "2019-09-03T20:26:03Z"

The error was ContainerCannotRun but for some reason the PipelineRun was still running

Steps to Reproduce the Problem

This was caused by a periodic job attempting to execute the nightly release pipeline (see #860 )

Additional Info

Sorry there is so little detail here - will try to add more if this happens again!

@bobcatfish
Copy link
Collaborator Author

There are two really weird things about this:

  • Execution of the exact same Pipeline 5 min before did not have the chdir error
  • The PipelineRun thinks it is still running (the nop container seems to be still running)

@bobcatfish
Copy link
Collaborator Author

hm this has happened now for a totally different step in a totally different Task

        - name: generate-release-version
          terminated:
            containerID: docker://b54604fff0bb4d083ed44eb5f47d987f55b671227f396951724d0c799c4ff677
            exitCode: 128
            finishedAt: "2019-09-03T20:37:34Z"
            message: |
              oci runtime error: container_linux.go:247: starting container process caused "chdir to cwd (\"/workspace/go/src/github.com/tektoncd/pipeline\") set in config.json failed: no such file or directory"
            reason: ContainerCannotRun
            startedAt: "2019-09-03T20:37:34Z"
        - name: git-source-6a6de0ab-ce8a-11e9-b641-0a580a340481-extr
          terminated:
            containerID: docker://303a124af1d82a1a6cfbd3e75a330c774debc77e1498a55ab8c9789d2e821904
            exitCode: 0
            finishedAt: "2019-09-03T20:37:32Z"
            reason: Completed
            startedAt: "2019-09-03T20:37:29Z"

@bobcatfish bobcatfish self-assigned this Sep 3, 2019
@bobcatfish
Copy link
Collaborator Author

After a good night's sleep I've realized that this is probably a duplicate of #725 - due to kubernetes/test-infra#13948 we're using v0.3.1 of Pipelines with Prow and the fix isn't in there.

(Tho it kind of boggles my mind that this only sometimes happens..)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant