-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TaskRun retries not working #1944
Comments
/kind bug |
0.9.2 behaviour The watch
Pods being spun for each retry
The taskRun
|
@vdemeester FYI Looks like its the same issue I described in slack, the assumption of a 1:1 mapping between a taskRun and a pod. The retry pod will never be created. Not sure how to fix given we have the same issue around not being able to uniquely identify a pod which failed/is retrying or has retried, could do a count for the pods but then once completed we wont know which status to attach to the task run |
Something is weird though, I thought the retries in pipeline would generate new Reusing the same
This.. well… doesn't work well as of now as the pod being already there… it fails instantly (where it would re-create it before for some odd reason – well not so odd |
This puts this fix in a weird state (and it makes it a huge fix to do as far as I can see)
|
Clearing the status and deleting the pod does not sound like something we want to do. at least not until we have a way of storing status and log somewhere else. If I understood correctly the change that broke things is the fact that the reconciler does not rely for podName from the taskRun anymore. My suggestion would be to revert that change for now to fix this. We can then take time in 0.11 to design a new way for retries to work together with #1709. |
Expected Behavior
Pipelines with tasks that have retries set should rerun a taskRun where it fails.
Actual Behavior
It does not rerun
Steps to Reproduce the Problem
Additional Info
Below are the sample resources, a watch of the task run and the taskRun after its completed.
Note the tasks complete too quickly to have actually executed and the retried status entries are all duplicates. On previous versions I'm seeing this work as expected with a different pod per task retry. I'm not seeing any issue or pr to explicitly change this behaviour around a pod per retry which implies it might be accidental.
Sample resources to reproduce
Watching
Retried taskRun
The text was updated successfully, but these errors were encountered: