Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix InternalEngineTests.testMergeThreadLogging #102640

Merged
merged 2 commits into from
Nov 27, 2023

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Nov 27, 2023

The test has to wait for all merge thread log messages (include Lucene ones) to be seen before reset the log level and stop the appender.

Previous attempt wasn't enough: the test failure in #90071 (comment) shows that Lucene's merge scheduler thread can log after the Elasticsearch one, and if the appender has been closed in the meanwhile it can fail the test.

1> [2023-07-17T11:33:12,559][TRACE][o.e.i.e.Engine           ] [testMergeThreadLogging] [index][0] rollback indexWriter done
  1> [2023-07-17T11:33:12,559][DEBUG][o.e.i.e.Engine           ] [testMergeThreadLogging] [index][0] engine closed [api]
  1> [2023-07-17T11:33:12,562][DEBUG][o.e.i.s.Store            ] [testMergeThreadLogging] [index][0] store reference count on close: 0
  1> [2023-07-17T11:33:12,562][TRACE][o.e.i.e.I.EngineMergeScheduler] [[index] [index][0] merge thread elasticsearch[[index][0]: Lucene Merge Thread #0] end
  1> [2023-07-17T11:33:12,562][TRACE][o.e.i.e.E.MS             ] [[index] [index][0] elasticsearch[[index][0]: Lucene Merge Thread #0] MS: merge thread elasticsearch[[index][0]: Lucene Merge Thread #0] end
  1> 2023-07-17 11:33:12,568 elasticsearch[[index][0]: Lucene Merge Thread #0] ERROR Attempted to append to non-started appender testMergeThreadLogging
  1> [2023-07-17T11:33:12,589][WARN ][o.e.i.e.Engine           ] [org.elasticsearch.index.engine.InternalEngineTests] [index][0] failed engine [merge failed]
  1> org.apache.lucene.index.MergePolicy$MergeException: org.apache.logging.log4j.core.appender.AppenderLoggingException: Attempted to append to non-started appender testMergeThreadLogging
  1> 	at org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2780) ~[main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983) ~[main/:?]
  1> 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[main/:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
  1> 	at java.lang.Thread.run(Thread.java:833) ~[?:?]
  1> Caused by: org.apache.logging.log4j.core.appender.AppenderLoggingException: Attempted to append to non-started appender testMergeThreadLogging

This pull request ensure that the appender saw the Merge Scheduler end message before resetting the log level and close the engine.

Closes #90071

The test has to wait for all merge thread log messages
(include Lucene ones) to be seen before reset the log
level and stop the appender.
@tlrx tlrx added >test Issues or PRs that are addressing/adding tests :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v8.12.0 v8.11.2 labels Nov 27, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Nov 27, 2023
@tlrx tlrx requested a review from kingherc November 27, 2023 13:44
Copy link
Contributor

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding!

[2023-07-17T11:33:12,562][TRACE][o.e.i.e.I.EngineMergeScheduler] [[index] [index][0] merge thread elasticsearch[[index][0]: Lucene Merge Thread #0] end
[2023-07-17T11:33:12,562][TRACE][o.e.i.e.E.MS ] [[index] [index][0] elasticsearch[[index][0]: Lucene Merge Thread #0] MS: merge thread elasticsearch[[index][0]: Lucene Merge Thread #0] end

For my knowledge, what's MS? And why are there two same messages? I guess the second one failed (because the logger was closed in the meantime)?

assertThat(mockAppender.mergeCompleted(), is(true));
});

Loggers.setLevel(rootLogger, savedLevel);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit are these needed? Because I see the .setLevel also in the finally(), but I do not see the engine.close() there (but I assume it happens automatically somehow?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really needed, but wanted to reset the log level before closing the engine (which can also log more things that we are not interested in this test)

@tlrx
Copy link
Member Author

tlrx commented Nov 27, 2023

For my knowledge, what's MS?

This is Lucene's MergeScheduler class which logs messages with a MS: prefix

And why are there two same messages?

The purpose of this test is to verify that Lucene's merge scheduler message are correctly logged in Elasticsearch logs. By setting the TRACE level at the root appender level, we trace Elasticsearch + Lucene logs, and ElasticsearchConcurrentMergeScheduler prints both logs.

I guess the second one failed (because the logger was closed in the meantime)?

Yes, the second one failed and that's why the test now waits for it.

@tlrx tlrx merged commit 64f4d75 into elastic:main Nov 27, 2023
@tlrx tlrx deleted the 2023/11/27/fix-90071 branch November 27, 2023 14:19
@tlrx
Copy link
Member Author

tlrx commented Nov 27, 2023

Thanks Iraklis!

@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.11

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Nov 27, 2023
The test has to wait for all merge thread log messages (include Lucene ones) 
to be seen before reset the log level and stop the appender.

Previous attempt wasn't enough: the test failure in elastic#90071 (comment) shows 
that Lucene's merge scheduler thread can log after the Elasticsearch one, and 
if the appender has been closed in the meanwhile it can fail the test.

This change ensure that the appender saw the Merge Scheduler end message 
before resetting the log level and close the engine.

Closes elastic#90071
elasticsearchmachine pushed a commit that referenced this pull request Nov 27, 2023
The test has to wait for all merge thread log messages (include Lucene ones) 
to be seen before reset the log level and stop the appender.

Previous attempt wasn't enough: the test failure in #90071 (comment) shows 
that Lucene's merge scheduler thread can log after the Elasticsearch one, and 
if the appender has been closed in the meanwhile it can fail the test.

This change ensure that the appender saw the Merge Scheduler end message 
before resetting the log level and close the engine.

Closes #90071
timgrein pushed a commit to timgrein/elasticsearch that referenced this pull request Nov 30, 2023
The test has to wait for all merge thread log messages (include Lucene ones) 
to be seen before reset the log level and stop the appender.

Previous attempt wasn't enough: the test failure in elastic#90071 (comment) shows 
that Lucene's merge scheduler thread can log after the Elasticsearch one, and 
if the appender has been closed in the meanwhile it can fail the test.

This change ensure that the appender saw the Merge Scheduler end message 
before resetting the log level and close the engine.

Closes elastic#90071
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test Issues or PRs that are addressing/adding tests v8.11.2 v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] InternalEngineTests testMergeThreadLogging failing
3 participants