-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flakiness in MasterServiceTests.testThrottlingForMultipleTaskTypes #8901
Fix flakiness in MasterServiceTests.testThrottlingForMultipleTaskTypes #8901
Conversation
Gradle Check (Jenkins) Run Completed with:
|
@andrross I think we should instead handle timeout of 0 to fail the task immediately instead of scheduling the asynchronous task to remove it from the queue on timeout. |
@sohami I don't love the inherent non-determinism here, but is a zero timeout useful in practice here? I'm assuming not (why schedule a task that is already timed out?), so I also don't love putting in additional logic for this case. The test would also just be testing this additional logic of the zero timeout case and not testing the normal timeout path, so I don't think it helps with coverage. Does that make sense or is there a different way to handle this case? |
@sohami I think it is probably better to deterministically test the timeout case in a separate test case. I'm thinking something like:
What do you think? |
I don't foresee any use case with 0 timeout but seems like we are allowing that to be set anyways. If we want to allow it than handling it separately would avoid adding extra timeout tasks for which most of the time behavior will be indeterministic. The other option is if there is no use case for timeout of 0 should we fail the task setting it to that value instead ? This will also help to catch issues probably where it is set to 0 incorrectly. |
Gradle Check (Jenkins) Run Completed with:
|
Sounds good to me. thanks! |
c6a8457
to
951ff07
Compare
The test configured a [timeout duration of zero][1] for certain tasks and asserted that all tasks were throttled or timed out. This is not a valid assertion because it is possible for a task to complete before the [asynchronous timeout operation runs][2], which means the task would complete successfully. The fix is to adjust the assertion to allow for successful tasks in this case. [1]: https://github.com/opensearch-project/OpenSearch/blob/60985bc300d9eafd36c1ab25d46235e1c925c565/server/src/test/java/org/opensearch/cluster/service/MasterServiceTests.java#L941 [2]: https://github.com/opensearch-project/OpenSearch/blob/9fc3f4096958159ec9b53012fc7ced19fd793e1b/server/src/main/java/org/opensearch/common/util/concurrent/PrioritizedOpenSearchThreadPoolExecutor.java#L266 Signed-off-by: Andrew Ross <[email protected]>
Signed-off-by: Andrew Ross <[email protected]>
951ff07
to
8c5f7e0
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #8901 +/- ##
============================================
- Coverage 71.05% 70.99% -0.06%
Complexity 57198 57198
============================================
Files 4758 4758
Lines 269863 269863
Branches 39484 39484
============================================
- Hits 191755 191596 -159
- Misses 61982 62128 +146
- Partials 16126 16139 +13 |
#8901) * Fix flakiness in MasterServiceTests.testThrottlingForMultipleTaskTypes The test configured a [timeout duration of zero][1] for certain tasks and asserted that all tasks were throttled or timed out. This is not a valid assertion because it is possible for a task to complete before the [asynchronous timeout operation runs][2], which means the task would complete successfully. The fix is to adjust the assertion to allow for successful tasks in this case. [1]: https://github.com/opensearch-project/OpenSearch/blob/60985bc300d9eafd36c1ab25d46235e1c925c565/server/src/test/java/org/opensearch/cluster/service/MasterServiceTests.java#L941 [2]: https://github.com/opensearch-project/OpenSearch/blob/9fc3f4096958159ec9b53012fc7ced19fd793e1b/server/src/main/java/org/opensearch/common/util/concurrent/PrioritizedOpenSearchThreadPoolExecutor.java#L266 Signed-off-by: Andrew Ross <[email protected]> * Add a deterministic test case for timeout Signed-off-by: Andrew Ross <[email protected]> --------- Signed-off-by: Andrew Ross <[email protected]> (cherry picked from commit e2a664c) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-project#8901) * Fix flakiness in MasterServiceTests.testThrottlingForMultipleTaskTypes The test configured a [timeout duration of zero][1] for certain tasks and asserted that all tasks were throttled or timed out. This is not a valid assertion because it is possible for a task to complete before the [asynchronous timeout operation runs][2], which means the task would complete successfully. The fix is to adjust the assertion to allow for successful tasks in this case. [1]: https://github.com/opensearch-project/OpenSearch/blob/60985bc300d9eafd36c1ab25d46235e1c925c565/server/src/test/java/org/opensearch/cluster/service/MasterServiceTests.java#L941 [2]: https://github.com/opensearch-project/OpenSearch/blob/9fc3f4096958159ec9b53012fc7ced19fd793e1b/server/src/main/java/org/opensearch/common/util/concurrent/PrioritizedOpenSearchThreadPoolExecutor.java#L266 Signed-off-by: Andrew Ross <[email protected]> * Add a deterministic test case for timeout Signed-off-by: Andrew Ross <[email protected]> --------- Signed-off-by: Andrew Ross <[email protected]>
#8901) (#8934) * Fix flakiness in MasterServiceTests.testThrottlingForMultipleTaskTypes The test configured a [timeout duration of zero][1] for certain tasks and asserted that all tasks were throttled or timed out. This is not a valid assertion because it is possible for a task to complete before the [asynchronous timeout operation runs][2], which means the task would complete successfully. The fix is to adjust the assertion to allow for successful tasks in this case. [1]: https://github.com/opensearch-project/OpenSearch/blob/60985bc300d9eafd36c1ab25d46235e1c925c565/server/src/test/java/org/opensearch/cluster/service/MasterServiceTests.java#L941 [2]: https://github.com/opensearch-project/OpenSearch/blob/9fc3f4096958159ec9b53012fc7ced19fd793e1b/server/src/main/java/org/opensearch/common/util/concurrent/PrioritizedOpenSearchThreadPoolExecutor.java#L266 * Add a deterministic test case for timeout --------- (cherry picked from commit e2a664c) Signed-off-by: Andrew Ross <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
After the fix in opensearch-project#8901, I ran the following in a loop all day (about 1400 times) and saw no failures: ``` ./gradlew ':server:test' --tests "org.opensearch.cluster.service.MasterServiceTests" ``` Signed-off-by: Andrew Ross <[email protected]>
opensearch-project#8901) * Fix flakiness in MasterServiceTests.testThrottlingForMultipleTaskTypes The test configured a [timeout duration of zero][1] for certain tasks and asserted that all tasks were throttled or timed out. This is not a valid assertion because it is possible for a task to complete before the [asynchronous timeout operation runs][2], which means the task would complete successfully. The fix is to adjust the assertion to allow for successful tasks in this case. [1]: https://github.com/opensearch-project/OpenSearch/blob/60985bc300d9eafd36c1ab25d46235e1c925c565/server/src/test/java/org/opensearch/cluster/service/MasterServiceTests.java#L941 [2]: https://github.com/opensearch-project/OpenSearch/blob/9fc3f4096958159ec9b53012fc7ced19fd793e1b/server/src/main/java/org/opensearch/common/util/concurrent/PrioritizedOpenSearchThreadPoolExecutor.java#L266 Signed-off-by: Andrew Ross <[email protected]> * Add a deterministic test case for timeout Signed-off-by: Andrew Ross <[email protected]> --------- Signed-off-by: Andrew Ross <[email protected]>
After the fix in #8901, I ran the following in a loop all day (about 1400 times) and saw no failures: ``` ./gradlew ':server:test' --tests "org.opensearch.cluster.service.MasterServiceTests" ``` Signed-off-by: Andrew Ross <[email protected]>
After the fix in #8901, I ran the following in a loop all day (about 1400 times) and saw no failures: ``` ./gradlew ':server:test' --tests "org.opensearch.cluster.service.MasterServiceTests" ``` Signed-off-by: Andrew Ross <[email protected]> (cherry picked from commit 8b751f8) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
After the fix in #8901, I ran the following in a loop all day (about 1400 times) and saw no failures: ``` ./gradlew ':server:test' --tests "org.opensearch.cluster.service.MasterServiceTests" ``` (cherry picked from commit 8b751f8) Signed-off-by: Andrew Ross <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-project#8901) * Fix flakiness in MasterServiceTests.testThrottlingForMultipleTaskTypes The test configured a [timeout duration of zero][1] for certain tasks and asserted that all tasks were throttled or timed out. This is not a valid assertion because it is possible for a task to complete before the [asynchronous timeout operation runs][2], which means the task would complete successfully. The fix is to adjust the assertion to allow for successful tasks in this case. [1]: https://github.com/opensearch-project/OpenSearch/blob/60985bc300d9eafd36c1ab25d46235e1c925c565/server/src/test/java/org/opensearch/cluster/service/MasterServiceTests.java#L941 [2]: https://github.com/opensearch-project/OpenSearch/blob/9fc3f4096958159ec9b53012fc7ced19fd793e1b/server/src/main/java/org/opensearch/common/util/concurrent/PrioritizedOpenSearchThreadPoolExecutor.java#L266 Signed-off-by: Andrew Ross <[email protected]> * Add a deterministic test case for timeout Signed-off-by: Andrew Ross <[email protected]> --------- Signed-off-by: Andrew Ross <[email protected]> Signed-off-by: Kaushal Kumar <[email protected]>
After the fix in opensearch-project#8901, I ran the following in a loop all day (about 1400 times) and saw no failures: ``` ./gradlew ':server:test' --tests "org.opensearch.cluster.service.MasterServiceTests" ``` Signed-off-by: Andrew Ross <[email protected]> Signed-off-by: Kaushal Kumar <[email protected]>
opensearch-project#8901) * Fix flakiness in MasterServiceTests.testThrottlingForMultipleTaskTypes The test configured a [timeout duration of zero][1] for certain tasks and asserted that all tasks were throttled or timed out. This is not a valid assertion because it is possible for a task to complete before the [asynchronous timeout operation runs][2], which means the task would complete successfully. The fix is to adjust the assertion to allow for successful tasks in this case. [1]: https://github.com/opensearch-project/OpenSearch/blob/60985bc300d9eafd36c1ab25d46235e1c925c565/server/src/test/java/org/opensearch/cluster/service/MasterServiceTests.java#L941 [2]: https://github.com/opensearch-project/OpenSearch/blob/9fc3f4096958159ec9b53012fc7ced19fd793e1b/server/src/main/java/org/opensearch/common/util/concurrent/PrioritizedOpenSearchThreadPoolExecutor.java#L266 Signed-off-by: Andrew Ross <[email protected]> * Add a deterministic test case for timeout Signed-off-by: Andrew Ross <[email protected]> --------- Signed-off-by: Andrew Ross <[email protected]> Signed-off-by: Ivan Brusic <[email protected]>
After the fix in opensearch-project#8901, I ran the following in a loop all day (about 1400 times) and saw no failures: ``` ./gradlew ':server:test' --tests "org.opensearch.cluster.service.MasterServiceTests" ``` Signed-off-by: Andrew Ross <[email protected]> Signed-off-by: Ivan Brusic <[email protected]>
opensearch-project#8901) * Fix flakiness in MasterServiceTests.testThrottlingForMultipleTaskTypes The test configured a [timeout duration of zero][1] for certain tasks and asserted that all tasks were throttled or timed out. This is not a valid assertion because it is possible for a task to complete before the [asynchronous timeout operation runs][2], which means the task would complete successfully. The fix is to adjust the assertion to allow for successful tasks in this case. [1]: https://github.com/opensearch-project/OpenSearch/blob/60985bc300d9eafd36c1ab25d46235e1c925c565/server/src/test/java/org/opensearch/cluster/service/MasterServiceTests.java#L941 [2]: https://github.com/opensearch-project/OpenSearch/blob/9fc3f4096958159ec9b53012fc7ced19fd793e1b/server/src/main/java/org/opensearch/common/util/concurrent/PrioritizedOpenSearchThreadPoolExecutor.java#L266 Signed-off-by: Andrew Ross <[email protected]> * Add a deterministic test case for timeout Signed-off-by: Andrew Ross <[email protected]> --------- Signed-off-by: Andrew Ross <[email protected]> Signed-off-by: Shivansh Arora <[email protected]>
After the fix in opensearch-project#8901, I ran the following in a loop all day (about 1400 times) and saw no failures: ``` ./gradlew ':server:test' --tests "org.opensearch.cluster.service.MasterServiceTests" ``` Signed-off-by: Andrew Ross <[email protected]> Signed-off-by: Shivansh Arora <[email protected]>
The test configured a timeout duration of zero for certain tasks and asserted that all tasks were throttled or timed out. This is not a valid assertion because it is possible for a task to complete before the asynchronous timeout operation runs, which means the task would complete successfully. The fix is to adjust the assertion to allow for successful tasks in this case.
Related Issues
Resolves #5958
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.