Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jre crash when org.apache.lucene.codecs.DocValuesConsumer$SortedNumericDocValuesSub.nextDoc() #67882

Closed
billhong-just opened this issue Jan 25, 2021 · 4 comments
Labels
needs:triage Requires assignment of a team area label Team:Data Management Meta label for data/management team

Comments

@billhong-just
Copy link

Elasticsearch version (bin/elasticsearch --version):

7.10.2

Plugins installed:

I install elasticsearch following this doc.
No other plugin installed.

JVM version (java -version):

  • JRE version: OpenJDK Runtime Environment AdoptOpenJDK (15.0.1+9) (build 15.0.1+9)
  • Java VM: OpenJDK 64-Bit Server VM AdoptOpenJDK (15.0.1+9, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)

OS version (uname -a if on a Unix-like system):

Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-42-generic x86_64)
Linux sw-vwordpress01 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

I am running a single node elasticsearch cluster for elastic observability.
After running for hours or days, the cluster crash.

Steps to reproduce:

I use apm-agent-dotnet v1.6.1 to send apm transaction & metrics to APM server.
The APM server stay on the same server which host elasticsearch single node cluster.
After running for hours or days, the cluster crash.
And then it produce a hs_err_pidXXXXX.log in /var/log/elasticsearch directory.

# Problematic frame:
# J 14564 c2 org.apache.lucene.codecs.DocValuesConsumer$SortedNumericDocValuesSub.nextDoc()I (8 bytes) @ 0x00007f650c700669 [0x00007f650c700620+0x0000000000000049]

Provide logs (if relevant):

  • hs_err_pid13402.log

  • hs_err_pid1143.log

  • Execute sudo systemctl status elasticsearch show the following messages

    root@s-docker01:/var/log/elasticsearch# sudo systemctl status elasticsearch
    ● elasticsearch.service - Elasticsearch
         Loaded: loaded (/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
         Active: failed (Result: signal) since Mon 2021-01-25 23:15:03 CST; 55min ago
           Docs: https://www.elastic.co
        Process: 1143 ExecStart=/usr/share/elasticsearch/bin/systemd-entrypoint -p ${PID_DIR}/elasticsearch.pid --quiet (code=killed, signal=ABRT)
       Main PID: 1143 (code=killed, signal=ABRT)
          Tasks: 0 (limit: 38033)
         Memory: 398.3M
         CGroup: /system.slice/elasticsearch.service
    
    Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: # An error report file with more information is saved as:
    Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: # /var/log/elasticsearch/hs_err_pid1143.log
    Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: #
    Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: # If you would like to submit a bug report, please visit:
    Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: #   https://github.com/AdoptOpenJDK/openjdk-support/issues
    Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: # The crash happened outside the Java Virtual Machine in native code.
    Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: # See problematic frame for where to report the bug.
    Jan 25 23:15:02 s-docker01 systemd-entrypoint[1143]: #
    Jan 25 23:15:03 s-docker01 systemd[1]: elasticsearch.service: Main process exited, code=killed, status=6/ABRT
    Jan 25 23:15:03 s-docker01 systemd[1]: elasticsearch.service: Failed with result 'signal'.
@sebelga sebelga added needs:triage Requires assignment of a team area label Team:Data Management Meta label for data/management team labels Jan 25, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@nik9000
Copy link
Member

nik9000 commented Jan 25, 2021

Generally this kind of thing is bad storage. The JVM will sig ABRT when reading a memory mapped file fails in a nasty way. I'd check dmesg and the like for hardware failure messages.

@billhong-just
Copy link
Author

billhong-just commented Jan 27, 2021

@nik9000

Hi, nik.
Today my single node elasticsearch cluster crash again, then I dump relevant logs.
But I cannot tell which hardware caused it from the output of dmesg.
Could you please have a look and help me to point out where the problem is?

Provide logs (if relevant):

  • hs_err_pid1144.log

  • dmesg_2021-01-27.log

  • Execute sudo systemctl status elasticsearch show the following messages

    ● elasticsearch.service - Elasticsearch
         Loaded: loaded (/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
         Active: failed (Result: signal) since Wed 2021-01-27 09:14:35 CST; 24min ago
           Docs: https://www.elastic.co
        Process: 1144 ExecStart=/usr/share/elasticsearch/bin/systemd-entrypoint -p ${PID_DIR}/elasticsearch.pid --quiet (code=killed, signal=ABRT)
       Main PID: 1144 (code=killed, signal=ABRT)
          Tasks: 0 (limit: 38033)
         Memory: 4.0G
         CGroup: /system.slice/elasticsearch.service
    
    Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]:  scopes data    [0x00007ffa1cbd3e88,0x00007ffa1cbd3e98] = 16
    Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]:  scopes pcs     [0x00007ffa1cbd3e98,0x00007ffa1cbd3ec8] = 48
    Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]:  dependencies   [0x00007ffa1cbd3ec8,0x00007ffa1cbd3ed0] = 8
    Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]:  handler table  [0x00007ffa1cbd3ed0,0x00007ffa1cbd3ee8] = 24
    Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]: #
    Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]: # If you would like to submit a bug report, please visit:
    Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]: #   https://github.com/AdoptOpenJDK/openjdk-support/issues
    Jan 27 09:14:34 s-docker01 systemd-entrypoint[1144]: #
    Jan 27 09:14:35 s-docker01 systemd[1]: elasticsearch.service: Main process exited, code=killed, status=6/ABRT
    Jan 27 09:14:35 s-docker01 systemd[1]: elasticsearch.service: Failed with result 'signal'.

@DaveCTurner
Copy link
Contributor

This is a cross-post from the forums and as Nik says this is almost always flaky hardware. I don't think there's any specific action to take on the Elasticsearch side (yet) so it would be better to keep the discussion on the forums. We prefer to restrict Github to just verified bug reports, feature requests, and pull requests. Therefore I am closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs:triage Requires assignment of a team area label Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

5 participants