Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-51454] Pipeline retry operation doesn't retry when there is a timeout inside of it #144

Merged
merged 18 commits into from
Aug 16, 2021

Conversation

basil
Copy link
Member

@basil basil commented May 29, 2021

See JENKINS-51454. Downstream of jenkinsci/workflow-step-api-plugin#61.

Problem

There are two desired behaviors depending on the placement of timeout and retry relative to each other.

First case (JENKINS-51454):

retry(3) {
  timeout(time: 5, unit: 'MINUTES') {
    // Something that can fail
  }
}

In this case, if the timeout triggers, the desired behavior is for retry to run its body again. This is not the current behavior.

Second case (JENKINS-44379):

timeout(time: 5, unit: 'MINUTES') {
  retry(3) {
    // Something that can fail
  }
}

In this case, if the timeout triggers, the desired behavior is for retry to be aborted without retrying anything. This is the current behavior, which is working as-designed after JENKINS-44379.

Solution

JENKINS-60354 overlaps a bit with this issue. @dwnusbaum suggested that the approach from #81 by @rudolfwalter could be tweaked a bit to use jenkinsci/workflow-step-api-plugin#51 (if a setter was added) to change the value of FlowInterruptedException#actualInterruption from true to false when the exception propagates past the timeout step that threw the exception, in which case the fix for JENKINS-60354 in #102 would fix JENKINS-51454 as well. That is precisely what we are doing in jenkinsci/workflow-step-api-plugin#61 and this PR.

@Override
public void onSuccess(StepContext context, Object result) {
if (killer != null) {
killer.cancel(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume is unlikely to throw exceptions? See special behaviors lost when moving from TailCall. Maybe better to enhance TailCall to make onFailure nonfinal so you can call super but still do some special processing with the exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I wasn't sure whether to put the call to super.onFailure() before or after the new logic, but I put it before the new logic and the tests still passed.

ExceededTimeout exceededTimeout = new ExceededTimeout();
exceededTimeout.setNodeId(nodeId);
final Throwable death =
new FlowInterruptedException(Result.ABORTED, exceededTimeout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not understanding why all the complexity. Does a simple

Suggested change
new FlowInterruptedException(Result.ABORTED, exceededTimeout);
new FlowInterruptedException(Result.ABORTED, false, exceededTimeout);

not solve the issue here, without the upstream PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do that, then I think both case 1 and case 2 from the PR description would have the same behavior. (Maybe fine, but IMO that just trades the current frustrations with case 1 to new frustrations with case 2.)

@basil basil closed this Aug 2, 2021
@basil basil reopened this Aug 2, 2021
@timja
Copy link
Member

timja commented Aug 9, 2021

@dwnusbaum @car-roll

@basil
Copy link
Member Author

basil commented Aug 9, 2021

The test that just failed is flaky. I just re-ran it locally and it passed.

@basil basil closed this Aug 9, 2021
@basil basil reopened this Aug 9, 2021
@basil
Copy link
Member Author

basil commented Aug 9, 2021

12:14:18  Cannot contact aci-maven-11-2thrx: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@68f899af:JNLP4-connect connection from 40.88.240.16/40.88.240.16:6273": Remote call on JNLP4-connect connection from 40.88.240.16/40.88.240.16:6273 failed. The channel is closing down or has closed down

I am stepping away from this for now. I believe the change is ready for merge and release, but the CI build is failing due to problems that I believe are out of my control and not caused by me.

@timja
Copy link
Member

timja commented Aug 10, 2021

Build is green now, cc @dwnusbaum @car-roll @bitwiseman

@car-roll car-roll requested a review from dwnusbaum August 13, 2021 14:51
@car-roll
Copy link
Contributor

@dwnusbaum would you mind taking a look at this since @jglick is not available atm?

Copy link
Member

@dwnusbaum dwnusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine, but I added some comments. The main thing I would change would be to call FlowInterruptedException.handle instead of directly printing the causes.

@basil
Copy link
Member Author

basil commented Aug 14, 2021

All feedback has been addressed, and the build is green. I think it's time to put this bug to rest.

@car-roll car-roll merged commit 65d323d into jenkinsci:master Aug 16, 2021
@basil basil deleted the JENKINS-51454 branch August 16, 2021 15:45
@rajarajeshwarie-kg-18298

does the timeout be enabled in the retry of the pipeline? I tried the code..when the timeout exceeded, the retry option was working but without timeout.

stage (‘Build’) {

    options {
    retry(2)
    timeout(time: 10, unit: 'MINUTES')
    
    }
    steps {
       //code
    }
    post {  
        success { 
         //code
        }
    }
}

@jglick
Copy link
Member

jglick commented Sep 7, 2023

Check in Scripted syntax. If that works, and you only see a problem in Declarative syntax, file a bug report in pipeline-model-definition-plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants