Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set task as failed #3571

Merged
merged 1 commit into from
Feb 2, 2021
Merged

set task as failed #3571

merged 1 commit into from
Feb 2, 2021

Conversation

pritidesai
Copy link
Member

@pritidesai pritidesai commented Dec 1, 2020

Changes

Set task as failed when the pod encounters CreateContainerConfigError. TaskRun status is now set to false instead of unknown with the same reason CreateContainerConfigError.

Before this change, taskRun status had the same reason CreateContainerConfigError but status was set to unknown which means task is still running in turn pipeline is still running until it hits the timeout.

PR #1907 tried to fix this issue which was also reported in issue #1902.

Fixes #3412

/cc @vdemeester @vincent-pli @afrittoli @chmouel

/kind bug

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes tests (if functionality changed/added)
  • Includes docs (if user facing)
  • Commit messages follow commit message best practices
  • Release notes block has been filled in or deleted (only if no user facing changes)

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

Release Notes

Declare task failure when it hits `CreateContainerConfigError` instead of setting it to unknown i.e. running.

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 1, 2020
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/status.go 92.4% 92.2% -0.2

@ghost
Copy link

ghost commented Dec 1, 2020

When a pod exceeds node resources I think Kubernetes will try to re-schedule it again when resources may have freed up? So the TaskRun might eventually succeed. If I'm understanding right, I don't know if we want to mark that as an immediate failure?

Set task as failed when it encounters "CreateContainerConfigError".
TaskRun status is set to this reason and condition status is set to "false"
instead of "unknown".

Before this change, taskRun status had the same reason but
status was set to "unknown" which means task is still running.
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/status.go 92.4% 92.2% -0.2

@pritidesai
Copy link
Member Author

When a pod exceeds node resources I think Kubernetes will try to re-schedule it again when resources may have freed up? So the TaskRun might eventually succeed. If I'm understanding right, I don't know if we want to mark that as an immediate failure?

I am not sure either, was going by the comment here. I have reverted the changes for ExceededNodeResources and it continue being marked as unknown.

// ReasonPending indicates that the pod is in corev1.Pending, and the reason is not
// ReasonExceededNodeResources or IsPodHitConfigError
ReasonPending = "Pending"

@vincent-pli
Copy link
Member

@pritidesai Pod does not fit any host will be preempt or re-queue after a short time sleep. so @sbwsg is correct.

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbwsg, vincent-pli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 2, 2020
@skaegi
Copy link
Contributor

skaegi commented Jan 29, 2021

@pritidesai would this handle the case when someone has a bad imagePullSecret -- we get that all the time and just see the Task Cancelled with no explanation. If so this PR would be a huge help for users trying to determine what went wrong.

@skaegi
Copy link
Contributor

skaegi commented Feb 1, 2021

ah... ok according to our team this is needed because currently if we hit a config error that results in an ImagePullErr Tekton will just hang and cycle to ImagePullBackOff before ultimately waiting for the Pipeline to timeout. We're managing this currently by detecting and cancelling the TaskRun but this unfortunately hides the ConfigError from the user.

@pritidesai
Copy link
Member Author

thanks @sbwsg and @vincent-pli for the review.
looking for one lgtm 🙏
Adding it to next milestone 0.21

@pritidesai pritidesai added this to the Pipelines 0.21 milestone Feb 1, 2021
@chmouel
Copy link
Member

chmouel commented Feb 2, 2021

awesome, thanks you so much for fixing this @pritidesai 🙇🏽

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 2, 2021
@ghost
Copy link

ghost commented Feb 2, 2021

/test check-pr-has-kind-label

@tekton-robot tekton-robot merged commit a7ad683 into tektoncd:master Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Taskrun status incorrectly assigned
5 participants