jre crash when org.apache.lucene.codecs.DocValuesConsumer$SortedNumericDocValuesSub.nextDoc() #67882

billhong-just · 2021-01-25T08:55:49Z

Elasticsearch version (bin/elasticsearch --version):

7.10.2

Plugins installed:

I install elasticsearch following this doc.
No other plugin installed.

JVM version (java -version):

JRE version: OpenJDK Runtime Environment AdoptOpenJDK (15.0.1+9) (build 15.0.1+9)
Java VM: OpenJDK 64-Bit Server VM AdoptOpenJDK (15.0.1+9, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)

OS version (uname -a if on a Unix-like system):

Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-42-generic x86_64)
Linux sw-vwordpress01 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

I am running a single node elasticsearch cluster for elastic observability.
After running for hours or days, the cluster crash.

Steps to reproduce:

I use apm-agent-dotnet v1.6.1 to send apm transaction & metrics to APM server.
The APM server stay on the same server which host elasticsearch single node cluster.
After running for hours or days, the cluster crash.
And then it produce a hs_err_pidXXXXX.log in /var/log/elasticsearch directory.

# Problematic frame:
# J 14564 c2 org.apache.lucene.codecs.DocValuesConsumer$SortedNumericDocValuesSub.nextDoc()I (8 bytes) @ 0x00007f650c700669 [0x00007f650c700620+0x0000000000000049]

Provide logs (if relevant):

hs_err_pid13402.log
hs_err_pid1143.log

Execute sudo systemctl status elasticsearch show the following messages

root@s-docker01:/var/log/elasticsearch# sudo systemctl status elasticsearch
● elasticsearch.service - Elasticsearch
     Loaded: loaded (/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
     Active: failed (Result: signal) since Mon 2021-01-25 23:15:03 CST; 55min ago
       Docs: https://www.elastic.co
    Process: 1143 ExecStart=/usr/share/elasticsearch/bin/systemd-entrypoint -p ${PID_DIR}/elasticsearch.pid --quiet (code=killed, signal=ABRT)
   Main PID: 1143 (code=killed, signal=ABRT)
      Tasks: 0 (limit: 38033)
     Memory: 398.3M
     CGroup: /system.slice/elasticsearch.service

Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: # An error report file with more information is saved as:
Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: # /var/log/elasticsearch/hs_err_pid1143.log
Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: #
Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: # If you would like to submit a bug report, please visit:
Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: #   https://github.com/AdoptOpenJDK/openjdk-support/issues
Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: # The crash happened outside the Java Virtual Machine in native code.
Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: # See problematic frame for where to report the bug.
Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: #
Jan 25 23:15:03 s-docker01 systemd[1]: elasticsearch.service: Main process exited, code=killed, status=6/ABRT
Jan 25 23:15:03 s-docker01 systemd[1]: elasticsearch.service: Failed with result 'signal'.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-01-25T09:41:45Z

Pinging @elastic/es-core-features (Team:Core/Features)

nik9000 · 2021-01-25T18:26:46Z

Generally this kind of thing is bad storage. The JVM will sig ABRT when reading a memory mapped file fails in a nasty way. I'd check dmesg and the like for hardware failure messages.

billhong-just · 2021-01-27T01:51:09Z

@nik9000

Hi, nik.
Today my single node elasticsearch cluster crash again, then I dump relevant logs.
But I cannot tell which hardware caused it from the output of dmesg.
Could you please have a look and help me to point out where the problem is?

Provide logs (if relevant):

hs_err_pid1144.log
dmesg_2021-01-27.log

Execute sudo systemctl status elasticsearch show the following messages

● elasticsearch.service - Elasticsearch
     Loaded: loaded (/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
     Active: failed (Result: signal) since Wed 2021-01-27 09:14:35 CST; 24min ago
       Docs: https://www.elastic.co
    Process: 1144 ExecStart=/usr/share/elasticsearch/bin/systemd-entrypoint -p ${PID_DIR}/elasticsearch.pid --quiet (code=killed, signal=ABRT)
   Main PID: 1144 (code=killed, signal=ABRT)
      Tasks: 0 (limit: 38033)
     Memory: 4.0G
     CGroup: /system.slice/elasticsearch.service

Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]:  scopes data    [0x00007ffa1cbd3e88,0x00007ffa1cbd3e98] = 16
Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]:  scopes pcs     [0x00007ffa1cbd3e98,0x00007ffa1cbd3ec8] = 48
Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]:  dependencies   [0x00007ffa1cbd3ec8,0x00007ffa1cbd3ed0] = 8
Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]:  handler table  [0x00007ffa1cbd3ed0,0x00007ffa1cbd3ee8] = 24
Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]: #
Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]: # If you would like to submit a bug report, please visit:
Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]: #   https://github.com/AdoptOpenJDK/openjdk-support/issues
Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]: #
Jan 27 09:14:35 s-docker01 systemd[1]: elasticsearch.service: Main process exited, code=killed, status=6/ABRT
Jan 27 09:14:35 s-docker01 systemd[1]: elasticsearch.service: Failed with result 'signal'.

DaveCTurner · 2021-01-28T08:18:15Z

This is a cross-post from the forums and as Nik says this is almost always flaky hardware. I don't think there's any specific action to take on the Elasticsearch side (yet) so it would be better to keep the discussion on the forums. We prefer to restrict Github to just verified bug reports, feature requests, and pull requests. Therefore I am closing this.

sebelga added needs:triage Requires assignment of a team area label Team:Data Management Meta label for data/management team labels Jan 25, 2021

DaveCTurner closed this as completed Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jre crash when org.apache.lucene.codecs.DocValuesConsumer$SortedNumericDocValuesSub.nextDoc() #67882

jre crash when org.apache.lucene.codecs.DocValuesConsumer$SortedNumericDocValuesSub.nextDoc() #67882

billhong-just commented Jan 25, 2021

elasticmachine commented Jan 25, 2021

nik9000 commented Jan 25, 2021

billhong-just commented Jan 27, 2021 •

edited

Loading

DaveCTurner commented Jan 28, 2021

jre crash when org.apache.lucene.codecs.DocValuesConsumer$SortedNumericDocValuesSub.nextDoc() #67882

jre crash when org.apache.lucene.codecs.DocValuesConsumer$SortedNumericDocValuesSub.nextDoc() #67882

Comments

billhong-just commented Jan 25, 2021

Elasticsearch version (bin/elasticsearch --version):

Plugins installed:

JVM version (java -version):

OS version (uname -a if on a Unix-like system):

Description of the problem including expected versus actual behavior:

Steps to reproduce:

Provide logs (if relevant):

elasticmachine commented Jan 25, 2021

nik9000 commented Jan 25, 2021

billhong-just commented Jan 27, 2021 • edited Loading

Provide logs (if relevant):

DaveCTurner commented Jan 28, 2021

billhong-just commented Jan 27, 2021 •

edited

Loading