Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integ Tests for Awareness Attribute Decommissioning #4715

Merged
merged 24 commits into from
Nov 3, 2022

Conversation

imRishN
Copy link
Member

@imRishN imRishN commented Oct 8, 2022

Description

This PR adds integration tests for awareness attribute decommission

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2022

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Rishab Nahata <[email protected]>
Signed-off-by: Rishab Nahata <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2022

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Rishab Nahata <[email protected]>
Signed-off-by: Rishab Nahata <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@imRishN imRishN marked this pull request as ready for review October 18, 2022 18:39
@imRishN imRishN requested review from a team and reta as code owners October 18, 2022 18:39
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@@ -56,6 +72,425 @@ public void cleanup() throws Exception {
assertNoTimeout(client().admin().cluster().prepareHealth().get());
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we verify that nodes are drained before initiating the decommissioning and once prepared for decommission no new request is being sent to these nodes?

Copy link
Member Author

@imRishN imRishN Oct 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can never certainly say if the node was drained completely or not. Although before initiating the decommission, we ensure weights for decommissioned zone is 0 but the metrics where we track HTTP requests doesn't get to zero. By the time we come up with an effective proposal for draining, we have implemented a 2 min wait after setting weights to 0 and executing decommission. Also, wait time can be set by the user as per need
Implemented as part of this PR - #4586

assertTrue(ex.getMessage().contains("invalid awareness attribute requested for decommissioning"));
});
}

Copy link
Contributor

@Gaganjuneja Gaganjuneja Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a test for a case of single zone domain to test how decommissioning fails there and may be quorum loss scenario?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for single zone decommission. Quorum loss checks is internally handled during reconfiguration of Voting Configuration

assertNodesRemovedAfterZoneDecommission(true);
}

public void testInvariantsAndLogsOnDecommissionedNodes() throws Exception {
Copy link
Contributor

@Gaganjuneja Gaganjuneja Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Methods are getting very big, around 150 lines in some cases. It's hard to read in a single go. Can we do something about it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to abstract out but use cases for each of these are unique and hence kept it like this

@imRishN
Copy link
Member Author

imRishN commented Oct 30, 2022

@Bukhtawar Can you please also review this PR?
@Gaganjuneja Can you please review it again?

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Rishab Nahata <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Rishab Nahata <[email protected]>
Signed-off-by: Rishab Nahata <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

codecov-commenter commented Oct 30, 2022

Codecov Report

Merging #4715 (74f746b) into main (d2c2ade) will decrease coverage by 0.08%.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##               main    #4715      +/-   ##
============================================
- Coverage     70.95%   70.87%   -0.09%     
+ Complexity    57971    57926      -45     
============================================
  Files          4688     4688              
  Lines        276901   276904       +3     
  Branches      40299    40300       +1     
============================================
- Hits         196472   196246     -226     
- Misses        64181    64406     +225     
- Partials      16248    16252       +4     
Impacted Files Coverage Δ
...ecommission/awareness/put/DecommissionRequest.java 85.71% <0.00%> (-6.60%) ⬇️
...arch/cluster/decommission/DecommissionService.java 35.65% <0.00%> (ø)
...adonly/AddIndexBlockClusterStateUpdateRequest.java 0.00% <0.00%> (-75.00%) ⬇️
...readonly/TransportVerifyShardIndexBlockAction.java 9.75% <0.00%> (-58.54%) ⬇️
...n/admin/indices/readonly/AddIndexBlockRequest.java 17.85% <0.00%> (-53.58%) ⬇️
...indices/readonly/TransportAddIndexBlockAction.java 20.68% <0.00%> (-41.38%) ⬇️
.../admin/indices/readonly/AddIndexBlockResponse.java 0.00% <0.00%> (-37.50%) ⬇️
...ain/java/org/opensearch/search/sort/MinAndMax.java 63.15% <0.00%> (-36.85%) ⬇️
.../indices/readonly/AddIndexBlockRequestBuilder.java 0.00% <0.00%> (-33.34%) ⬇️
...search/aggregations/pipeline/HoltWintersModel.java 21.47% <0.00%> (-30.88%) ⬇️
... and 442 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Rishab Nahata <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Rishab Nahata <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Rishab Nahata <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Rishab Nahata <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@Bukhtawar Bukhtawar merged commit 850bb2e into opensearch-project:main Nov 3, 2022
imRishN added a commit to imRishN/OpenSearch that referenced this pull request Nov 3, 2022
…ct#4715)

* Add integ test for awareness attribute decommissioning

Signed-off-by: Rishab Nahata <[email protected]>
Bukhtawar pushed a commit that referenced this pull request Nov 3, 2022
* Fail weight update when decommission ongoing and fail decommission when attribute not weighed away (#4839)

* Add changes for graceful node decommission (#4586)

* Add delay timeout for decommission request (#4931)

* Integ Tests for Awareness Attribute Decommissioning (#4715)

Signed-off-by: Rishab Nahata <[email protected]>
Signed-off-by: pranikum <[email protected]>
imRishN added a commit to imRishN/OpenSearch that referenced this pull request Nov 3, 2022
* Fail weight update when decommission ongoing and fail decommission when attribute not weighed away (opensearch-project#4839)

* Add changes for graceful node decommission (opensearch-project#4586)

* Add delay timeout for decommission request (opensearch-project#4931)

* Integ Tests for Awareness Attribute Decommissioning (opensearch-project#4715)

Signed-off-by: Rishab Nahata <[email protected]>
Signed-off-by: pranikum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants