Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterCount {p0={"search.concurrent_segment_search.enabled":"true"}} #9946

Closed
neetikasinghal opened this issue Sep 8, 2023 · 7 comments · Fixed by #10200 or #10436
Assignees
Labels
flaky-test Random test failure that succeeds on second run Search Search query, autocomplete ...etc

Comments

@neetikasinghal
Copy link
Contributor

References
https://build.ci.opensearch.org/job/gradle-check/25191/testReport/junit/org.opensearch.search.simple/SimpleSearchIT/testSimpleTerminateAfterCount__p0___search_concurrent_segment_search_enabled___true___/

To Reproduce

./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.simple.SimpleSearchIT" -Dtests.method="testSimpleTerminateAfterCount {p0={"search.concurrent_segment_search.enabled":"true"}}" -Dtests.seed=BF79A24A152C3CB1 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=th -Dtests.timezone=America/Danmarkshavn -Druntime.java=20

Stacktrace

java.lang.AssertionError: Count is 1+ hits but 1 was expected.  Total shards: 1 Successful shards: 1 & 0 shard failures:
	at __randomizedtesting.SeedInfo.seed([BF79A24A152C3CB1:45F350BC58E7D135]:0)
	at org.junit.Assert.fail(Assert.java:89)
	at org.opensearch.test.hamcrest.OpenSearchAssertions.assertHitCount(OpenSearchAssertions.java:303)
	at org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterCount(SimpleSearchIT.java:305)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1623)

@neetikasinghal neetikasinghal added bug Something isn't working untriaged labels Sep 8, 2023
@jed326 jed326 added flaky-test Random test failure that succeeds on second run and removed bug Something isn't working untriaged labels Sep 8, 2023
@jed326
Copy link
Collaborator

jed326 commented Sep 8, 2023

This should be related to the soft termination behavior of terminate_after in concurrent search. See:

Will follow-up on this issue after we complete that one.

@andrross
Copy link
Member

andrross commented Oct 5, 2023

Another failure here: #10375 (comment)

I'm going to reopen. @sohami or @jed326 can you take a look?

@andrross andrross reopened this Oct 5, 2023
@github-project-automation github-project-automation bot moved this from Done to In Progress in Concurrent Search Oct 5, 2023
@jed326
Copy link
Collaborator

jed326 commented Oct 5, 2023

🙁 That's interesting...we disabled concurrent search for the terminate_after workflow (#10200) (#10329) so this shouldn't be related that feature. Will take a look.

@jed326
Copy link
Collaborator

jed326 commented Oct 5, 2023

I am able to reproduce this using the test seed:

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.simple.SimpleSearchIT" -Dtests.method="testSimpleTerminateAfterCount {p0={"search.concurrent_segment_search.enabled":"false"}}" -Dtests.seed=C37FD3C5A66D02BC -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=he-IL -Dtests.timezone=Asia/Qostanay -Druntime.java=20

org.opensearch.search.simple.SimpleSearchIT > testSimpleTerminateAfterCount {p0={"search.concurrent_segment_search.enabled":"false"}} FAILED
    java.lang.AssertionError: Count is 7 hits but 3 was expected.  Total shards: 1 Successful shards: 1 & 0 shard failures:
        at __randomizedtesting.SeedInfo.seed([C37FD3C5A66D02BC:39F52133EBA6EF38]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.opensearch.test.hamcrest.OpenSearchAssertions.assertHitCount(OpenSearchAssertions.java:303)
        at org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterCount(SimpleSearchIT.java:311)

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.simple.SimpleSearchIT" -Dtests.method="testSimpleTerminateAfterCount {p0={"search.concurrent_segment_search.enabled":"true"}}" -Dtests.seed=C37FD3C5A66D02BC -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=he-IL -Dtests.timezone=Asia/Qostanay -Druntime.java=20

org.opensearch.search.simple.SimpleSearchIT > testSimpleTerminateAfterCount {p0={"search.concurrent_segment_search.enabled":"true"}} FAILED
    java.lang.AssertionError: Count is 4 hits but 3 was expected.  Total shards: 1 Successful shards: 1 & 0 shard failures:
        at __randomizedtesting.SeedInfo.seed([C37FD3C5A66D02BC:39F52133EBA6EF38]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.opensearch.test.hamcrest.OpenSearchAssertions.assertHitCount(OpenSearchAssertions.java:303)
        at org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterCount(SimpleSearchIT.java:311)


Suite: Test class org.opensearch.search.simple.SimpleSearchIT

However, it looks like there is flakiness for both concurrent and non-concurrent cases (which makes sense since concurrent search is disabled for this path).

Whenever I remove the size parameter from the search request the flakiness goes away but that seems to point to a bug in the service because TotalHits should not be related to size. Specifically, it seems this issue arises when size=0. However, the puzzling thing is why this only happens for some of the queries when size=0 and not all.

In short:
The following search request will sometimes, but not always, fail the hit count assertion when run in a loop and size=0. This is not related to concurrent segment search as that is disabled for terminate_after workflows and we can see test flakiness for both concurrent search enabled & disabled cases.

for (int i = 1; i < max; i++) {
size = randomIntBetween(0, max);
searchResponse = client().prepareSearch("test")
.setQuery(QueryBuilders.rangeQuery("field").gte(1).lte(max))
.setTerminateAfter(i)
.setSize(size)
.setTrackTotalHits(true)
.get();
assertHitCount(searchResponse, i);
assertTrue(searchResponse.isTerminatedEarly());
assertEquals(Math.min(i, size), searchResponse.getHits().getHits().length);
}

@jed326 jed326 added Search Search query, autocomplete ...etc and removed untriaged labels Oct 5, 2023
@jed326
Copy link
Collaborator

jed326 commented Oct 5, 2023

Just to be sure that this is an existing problem from before concurrent search changes, I checked out the 2.7 branch and applied the setSize(0) and saw the same failure:

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterCount" -Dtests.seed=C37FD3C5A66D02BC -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=he-IL -Dtests.timezone=Asia/Qostanay -Druntime.java=17

org.opensearch.search.simple.SimpleSearchIT > testSimpleTerminateAfterCount FAILED
    java.lang.AssertionError: Count is 8 hits but 3 was expected.  Total shards: 1 Successful shards: 1 & 0 shard failures:
        at __randomizedtesting.SeedInfo.seed([C37FD3C5A66D02BC:39F52133EBA6EF38]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.opensearch.test.hamcrest.OpenSearchAssertions.assertHitCount(OpenSearchAssertions.java:306)
        at org.opensearch.search.simple.SimpleSearchIT.testSimpleTerminateAfterCount(SimpleSearchIT.java:284)

@jed326
Copy link
Collaborator

jed326 commented Oct 6, 2023

Created a new issue to track this fix since this looks like it needs a deep dive into the search request path code: #10435

In the meantime I'll open a PR to avoid the size=0 case here so we don't continue to see build failures.

@dblock
Copy link
Member

dblock commented Oct 6, 2023

#10388 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Random test failure that succeeds on second run Search Search query, autocomplete ...etc
Projects
Status: Done
Archived in project
4 participants