Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout for ccr recovery action #37840

Merged
merged 14 commits into from
Jan 29, 2019

Conversation

Tim-Brooks
Copy link
Contributor

This is related to #35975. It adds a action timeout setting that allows
timeouts to be applied to the individual transport actions that are
used during a ccr recovery.

This is related to elastic#35975. It adds a action timeout setting that allows
timeouts to be applied to the individual transport actions that are
used during a ccr recovery.
@Tim-Brooks Tim-Brooks added >non-issue v7.0.0 :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features v6.7.0 labels Jan 25, 2019
@Tim-Brooks Tim-Brooks requested a review from ywelsch January 25, 2019 00:24
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@Tim-Brooks
Copy link
Contributor Author

@ywelsch

  1. What are you thoughts on the default value for this?
  2. I'm not sure how we want to pursue testing? It looks like we do not test the individual action timeouts for normal recovery. I could insert a wait or something using the CcrRestoreSourceService callbacks. But I had planned on removing those eventually as it seems weird to have callbacks dedicated for testing purposes.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code and default value looks good (we can always increase it later if we see the need for it). For testing, you could set a low timeout, and then inject a failure at an arbitrary point (i.e. randomly on one of the CCR recovery actions) using MockTransportService, see RecoveryActionBlocker as an example in IndexRecoveryIT, and then check that the restore fails in a timely fashion.

@Tim-Brooks Tim-Brooks requested a review from ywelsch January 26, 2019 06:42
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

settingsRequest = new ClusterUpdateSettingsRequest();
TimeValue defaultValue = CcrSettings.INDICES_RECOVERY_ACTION_TIMEOUT_SETTING.getDefault(Settings.EMPTY);
settingsRequest.persistentSettings(Settings.builder().put(CcrSettings.INDICES_RECOVERY_ACTION_TIMEOUT_SETTING.getKey(),
defaultValue));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting it to null also resets to the default

@@ -308,6 +313,78 @@ public void testRateLimitingIsEmployed() throws Exception {
}
}

public void testIndividualActionsTimeout() throws Exception {
ClusterUpdateSettingsRequest settingsRequest = new ClusterUpdateSettingsRequest();
TimeValue timeValue = new TimeValue(100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer TimeValue.timeValueMillis(100). It's clearer what the 100 are.

@Tim-Brooks
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/2

@Tim-Brooks Tim-Brooks merged commit f3f9cab into elastic:master Jan 29, 2019
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Jan 30, 2019
This is related to elastic#35975. It adds a action timeout setting that allows
timeouts to be applied to the individual transport actions that are
used during a ccr recovery.
Tim-Brooks added a commit that referenced this pull request Jan 31, 2019
This is related to #35975. It adds a action timeout setting that allows
timeouts to be applied to the individual transport actions that are
used during a ccr recovery.
@Tim-Brooks Tim-Brooks deleted the individual_action_timeouts branch December 18, 2019 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >non-issue v6.7.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants