Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] InternalEngineTests testMergeThreadLogging failing #90071

Closed
DaveCTurner opened this issue Sep 14, 2022 · 2 comments · Fixed by #90709 or #102640
Closed

[CI] InternalEngineTests testMergeThreadLogging failing #90071

DaveCTurner opened this issue Sep 14, 2022 · 2 comments · Fixed by #90709 or #102640
Assignees
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. low-risk An open issue or test failure that is a low risk to future releases Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Sep 14, 2022

Also seen in a local run of ./gradlew :server:test but doesn't easily reproduce, even when running many iterations in a loop.

Build scan:
https://gradle-enterprise.elastic.co/s/kbbpmnbim32ck/tests/:server:test/org.elasticsearch.index.engine.InternalEngineTests/testMergeThreadLogging

Reproduction line:
./gradlew ':server:test' --tests "org.elasticsearch.index.engine.InternalEngineTests.testMergeThreadLogging" -Dtests.seed=A7B12EB0B151C9B5 -Dtests.locale=es-PA -Dtests.timezone=Europe/Tiraspol -Druntime.java=17

Applicable branches:
main

Reproduces locally?:
No

Edit to add: eventually this failed for me after a couple of thousand iterations on a machine that was concurrently running stress-ng --timeout 0 --cpu 16

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.index.engine.InternalEngineTests&tests.test=testMergeThreadLogging

Failure excerpt:

java.lang.AssertionError: 
Expected: (an empty collection or iterable containing [a string starting with "JNDI lookup class is not available because this JRE does not support JNDI. JNDI string lookups will not be available, continuing configuration.", a string starting with "JMX runtime input lookup class is not available because this JRE does not support JMX. JMX lookups will not be available, continuing configuration. "] or iterable containing [a string starting with "JMX runtime input lookup class is not available because this JRE does not support JMX. JMX lookups will not be available, continuing configuration. "])
     but: was <[Attempted to append to non-started appender testMergeThreadLogging]>

  at __randomizedtesting.SeedInfo.seed([A7B12EB0B151C9B5:62EC5C98C993D8A4]:0)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
  at org.junit.Assert.assertThat(Assert.java:956)
  at org.junit.Assert.assertThat(Assert.java:923)
  at org.elasticsearch.test.ESTestCase.checkStaticState(ESTestCase.java:658)
  at org.elasticsearch.test.ESTestCase.after(ESTestCase.java:464)
  at jdk.internal.reflect.GeneratedMethodAccessor7.invoke(null:-1)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:568)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:833)

@DaveCTurner DaveCTurner added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI labels Sep 14, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 14, 2022
@tlrx tlrx self-assigned this Sep 20, 2022
tlrx added a commit to tlrx/elasticsearch that referenced this issue Oct 6, 2022
tlrx added a commit to tlrx/elasticsearch that referenced this issue Oct 10, 2022
@davidkyle
Copy link
Member

Reopening with a new failure on the main branch (8.10)

https://gradle-enterprise.elastic.co/s/zc4eqtipakrfk/tests/:server:test/org.elasticsearch.index.engine.InternalEngineTests/testMergeThreadLogging?expanded-stacktrace=WyIwIl0&top-execution=1

java.lang.AssertionError:     
Expected: (an empty collection or iterable containing [a string starting with "JNDI lookup class is not available because this JRE does not support JNDI. JNDI string lookups will not be available, continuing configuration.", a string starting with "JMX runtime input lookup class is not available because this JRE does not support JMX. JMX lookups will not be available, continuing configuration. "] or iterable containing [a string starting with "JMX runtime input lookup class is not available because this JRE does not support JMX. JMX lookups will not be available, continuing configuration. "]) 
     but: was <[Attempted to append to non-started appender testMergeThreadLogging]> 
at __randomizedtesting.SeedInfo.seed([D7244769E56C300:C82F365EE694D211]:0) 
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)  
at org.junit.Assert.assertThat(Assert.java:956)   
at org.junit.Assert.assertThat(Assert.java:923)   
at org.elasticsearch.test.ESTestCase.checkStaticState(ESTestCase.java:674)

@davidkyle davidkyle reopened this Jul 17, 2023
@volodk85 volodk85 added the low-risk An open issue or test failure that is a low risk to future releases label Oct 23, 2023
tlrx added a commit to tlrx/elasticsearch that referenced this issue Nov 27, 2023
The test has to wait for all merge thread log messages (include Lucene ones) 
to be seen before reset the log level and stop the appender.

Previous attempt wasn't enough: the test failure in elastic#90071 (comment) shows 
that Lucene's merge scheduler thread can log after the Elasticsearch one, and 
if the appender has been closed in the meanwhile it can fail the test.

This change ensure that the appender saw the Merge Scheduler end message 
before resetting the log level and close the engine.

Closes elastic#90071
leemthompo pushed a commit to leemthompo/elasticsearch that referenced this issue Nov 27, 2023
The test has to wait for all merge thread log messages (include Lucene ones) 
to be seen before reset the log level and stop the appender.

Previous attempt wasn't enough: the test failure in elastic#90071 (comment) shows 
that Lucene's merge scheduler thread can log after the Elasticsearch one, and 
if the appender has been closed in the meanwhile it can fail the test.

This change ensure that the appender saw the Merge Scheduler end message 
before resetting the log level and close the engine.

Closes elastic#90071
elasticsearchmachine pushed a commit that referenced this issue Nov 27, 2023
The test has to wait for all merge thread log messages (include Lucene ones) 
to be seen before reset the log level and stop the appender.

Previous attempt wasn't enough: the test failure in #90071 (comment) shows 
that Lucene's merge scheduler thread can log after the Elasticsearch one, and 
if the appender has been closed in the meanwhile it can fail the test.

This change ensure that the appender saw the Merge Scheduler end message 
before resetting the log level and close the engine.

Closes #90071
timgrein pushed a commit to timgrein/elasticsearch that referenced this issue Nov 30, 2023
The test has to wait for all merge thread log messages (include Lucene ones) 
to be seen before reset the log level and stop the appender.

Previous attempt wasn't enough: the test failure in elastic#90071 (comment) shows 
that Lucene's merge scheduler thread can log after the Elasticsearch one, and 
if the appender has been closed in the meanwhile it can fail the test.

This change ensure that the appender saw the Merge Scheduler end message 
before resetting the log level and close the engine.

Closes elastic#90071
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. low-risk An open issue or test failure that is a low risk to future releases Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
5 participants