Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decommission retry/pr #66

Closed
wants to merge 4 commits into from
Closed

Conversation

imRishN
Copy link
Owner

@imRishN imRishN commented Sep 26, 2022

Description

[Describe what this change achieves]

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link

Gradle Check (Jenkins) Run Completed with:

  • RESULT: null ❌
  • URL:
  • CommitID: 87a0f32

Signed-off-by: Rishab Nahata <[email protected]>
@github-actions
Copy link

Gradle Check (Jenkins) Run Completed with:

  • RESULT: null ❌
  • URL:
  • CommitID: ecbe9a6

}

public DecommissionRequest(DecommissionAttribute decommissionAttribute, TimeValue retryTimeout) {
this(decommissionAttribute, false, retryTimeout);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are we making sure that user doesn't specify this parameter ? did we evaluate putting this in request context as it is an internal detail of a request and not a user facing one ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about the retry flag or the timeout?
For flag, The REST action doesn't support accepting retry flag with the user and hence any request created at REST layer will always have it set to false. Today, it is only set to true when we trigger retry action from service
For timeout, the default is set and user can set one according to his case

Let me know if this answers your question

Comment on lines +83 to +98
final long remainingTimeoutMS = decommissionRequest.getRetryTimeout().millis() - (threadPool.relativeTimeInMillis() - startTime);
if (remainingTimeoutMS <= 0) {
logger.debug(
"timed out before retrying [{}] for attribute [{}] after cluster manager change",
DecommissionAction.NAME,
decommissionRequest.getDecommissionAttribute()
);
listener.onFailure(
new OpenSearchTimeoutException(
"timed out before retrying [{}] for attribute [{}] after cluster manager change",
DecommissionAction.NAME,
decommissionRequest.getDecommissionAttribute()
)
);
return;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need remainingTimeoutMS check here only ? Why do we need it ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timeout is a check for retry eligibility only. Other actions as part of decommission has their own timeouts. This is the place where we are actually triggering the retry and hence the check. If timed out, request will not be eligible for a retry

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do without this ? In worst case, the retried action will timeout and we will throw a timeout exception . This part of the code is looking out of place from readability POV.

Comment on lines -213 to 236
// explicitly calling listener.onFailure with NotClusterManagerException as the local node is not the cluster manager
// this will ensures that request is retried until cluster manager times out
logger.info(
"local node is not eligible to process the request, "
+ "throwing NotClusterManagerException to attempt a retry on an eligible node"
);
listener.onFailure(
new NotClusterManagerException(
"node ["
+ transportService.getLocalNode().toString()
+ "] not eligible to execute decommission request. Will retry until timeout."
)
// since the local node is no longer cluster manager which could've happened due to leader abdication,
// hence retrying the decommission action until it times out
logger.info("local node is not eligible to process the request, " + "retrying the transport action until it times out");
decommissionController.retryDecommissionAction(
decommissionRequest,
startTime,
ActionListener.delegateResponse(listener, (delegatedListener, t) -> {
logger.debug(
() -> new ParameterizedMessage(
"failed to retry decommission action for attribute [{}]",
decommissionRequest.getDecommissionAttribute()
),
t
);
clearVotingConfigExclusionAndUpdateStatus(false, false); // TODO - need to test this
delegatedListener.onFailure(t);
})
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are responding to user API call now ? Earlier , the retried request on new elected cluster manager would respond it . Would the same hold true now as well ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, response to the retried request is attached to the listener of original request. The chain will continue if multiple retries gets executed.

@imRishN
Copy link
Owner Author

imRishN commented Oct 5, 2022

Closing this and opened a new PR in upstream repo - opensearch-project#4684

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants