Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Startup errors disappear in docker compose up, make it difficult to debug #3121

Closed
pszabop opened this issue Jan 15, 2023 · 10 comments
Closed
Labels

Comments

@pszabop
Copy link

pszabop commented Jan 15, 2023

Describe the bug
If a startup error such as insufficient memory occurs, the error is never shown, making it difficult to debug

for example, this error occurs using docker run but doesn't show up in docker-compose up

[Native memory allocation (mmap) failed to map 1073741824 bytes for committing reserved memory](https://forum.opensearch.org/t/native-memory-allocation-mmap-failed-to-map-1073741824-bytes-for-committing-reserved-memory/4258)

To Reproduce
Steps to reproduce the behavior:

  1. Use the standard docker-compose.yml from the documentation
  2. Use a machine with only 1GB of memory such as an AWS t2.micro
  3. type docker-compose up opensearch-node1 to start only one of the nodes
  4. The process will exit with no useful error message
  5. Compare to this command, which shows a useful error

docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" opensearchproject/opensearch:latest

Expected behavior
The same errors should result, so that issues with docker-compose.yml can be debugged.

Plugins
none

Screenshots
see error messages noted above

Host/Environment (please complete the following information):
AWS t2.micro using recent wizard
Linux 5.10.157-139.675.amzn2.x86_64 opensearch-project/OpenSearch#1 SMP Thu Dec 8 01:29:11 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Docker version 20.10.17, build 100c701
Docker Compose version v2.15.1

Additional context
docker-compose install isn't done by yum on AWS linux, used instructions from here

@pszabop pszabop added bug Something isn't working untriaged Issues that have not yet been triaged labels Jan 15, 2023
@dblock
Copy link
Member

dblock commented Jan 17, 2023

I've run into this many times. I think showing the error in RED is probably helpful, but maybe there are better mechanisms to abort a docker run with an error? Ideas? I also wonder whether this issue belongs in opensearch-devops?

@saratvemulapalli
Copy link
Member

@dblock yup looks like the errors the author is interested in are being wiped by docker but usually we do have all the logs in opensearch.log. Lets start with opensearch-build and if needed we could come back to opensearch.

@saratvemulapalli saratvemulapalli transferred this issue from opensearch-project/OpenSearch Jan 20, 2023
@zelinh zelinh removed the untriaged Issues that have not yet been triaged label Feb 3, 2023
@peterzhuamazon
Copy link
Member

This might be related to the docker-compose log level settings at first glance.

@jordarlu
Copy link
Contributor

jordarlu commented Apr 4, 2023

When I tried with a t2micro (Amazon Linux 2023 AMI), there was no error shown up, but container was exited right away.

# docker -v
Docker version 20.10.17, build 100c701

# uname -a
Linux ip-172-31-93-37.ec2.internal 6.1.19-30.43.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Mar 15 14:44:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

# sudo service docker start
Redirecting to /bin/systemctl start docker.service

# ps -ef | grep -i docker
root       26530       1  0 18:37 ?        00:00:00 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --default-ulimit nofile=32768:65536
root       26747    2655  0 18:38 pts/0    00:00:00 grep --color=auto -i docker
[root@ip-172-31-93-37 ~]# docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" opensearchproject/opensearch:latest
Unable to find image 'opensearchproject/opensearch:latest' locally
latest: Pulling from opensearchproject/opensearch
07e4d356f367: Pull complete 
22651def3ff4: Pull complete 
c4a1df115c60: Pull complete 
272f76639b2e: Pull complete 
4f4fb700ef54: Pull complete 
fbc0efae03e1: Pull complete 
4d4f1680fcb1: Pull complete 
Digest: sha256:40f5bb40a543f7ea6458bc0ecffc7679c9df6f8836a9e3781a78829d587ef552
Status: Downloaded newer image for opensearchproject/opensearch:latest
707460e7c24406ee27ec7f256161161726674ec7e55b340f52a0f6d900197bd8
[root@ip-172-31-93-37 ~]# 
[root@ip-172-31-93-37 ~]# 
[root@ip-172-31-93-37 ~]# docker ps -a
CONTAINER ID   IMAGE                                 COMMAND                  CREATED          STATUS                      PORTS     NAMES
707460e7c244   opensearchproject/opensearch:latest   "./opensearch-docker…"   33 seconds ago   Exited (1) 26 seconds ago             beautiful_euler
[root@ip-172-31-93-37 ~]# 

but I can see the mem error by docker logs command .. maybe we can use this command to check ?...

# docker logs 707460e7c24406ee27ec7f256161161726674ec7e55b340f52a0f6d900197bd8
Enabling execution of install_demo_configuration.sh for OpenSearch Security Plugin
**************************************************************************
** This tool will be deprecated in the next major release of OpenSearch **
** https://github.com/opensearch-project/security/issues/1755           **
**************************************************************************
OpenSearch Security Demo Installer
 ** Warning: Do not use on production or public reachable systems **
Basedir: /usr/share/opensearch
OpenSearch install type: rpm/deb on NAME="Amazon Linux"
OpenSearch config dir: /usr/share/opensearch/config
OpenSearch config file: /usr/share/opensearch/config/opensearch.yml
OpenSearch bin dir: /usr/share/opensearch/bin
OpenSearch plugins dir: /usr/share/opensearch/plugins
OpenSearch lib dir: /usr/share/opensearch/lib
Detected OpenSearch Version: x-content-2.6.0
Detected OpenSearch Security Version: 2.6.0.0

### Success
### Execute this script now on all your nodes and then start all nodes
### OpenSearch Security will be automatically initialized.
### If you like to change the runtime configuration 
### change the files in ../../../config/opensearch-security and execute: 
"/usr/share/opensearch/plugins/opensearch-security/tools/securityadmin.sh" -cd "/usr/share/opensearch/config/opensearch-security" -icl -key "/usr/share/opensearch/config/kirk-key.pem" -cert "/usr/share/opensearch/config/kirk.pem" -cacert "/usr/share/opensearch/config/root-ca.pem" -nhnv
### or run ./securityadmin_demo.sh
### To use the Security Plugin ConfigurationGUI
### To access your secured cluster open https://<hostname>:<HTTP port> and log in with admin/admin.
### (Ignore the SSL certificate warning because we installed self-signed demo certificates)
Enabling OpenSearch Security Plugin
Enabling execution of OPENSEARCH_HOME/bin/opensearch-performance-analyzer/performance-analyzer-agent-cli for OpenSearch Performance Analyzer Plugin
Exception in thread "main" java.lang.RuntimeException: starting java failed with [1]
output:
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 1073741824 bytes for committing reserved memory.
# An error report file with more information is saved as:
# logs/hs_err_pid286.log
error:
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000c0000000, 1073741824, 0) failed; error='Not enough space' (errno=12)
        at org.opensearch.tools.launchers.JvmErgonomics.flagsFinal(JvmErgonomics.java:125)
        at org.opensearch.tools.launchers.JvmErgonomics.finalJvmOptions(JvmErgonomics.java:87)
        at org.opensearch.tools.launchers.JvmErgonomics.choose(JvmErgonomics.java:70)
        at org.opensearch.tools.launchers.JvmOptionsParser.jvmOptions(JvmOptionsParser.java:150)
        at org.opensearch.tools.launchers.JvmOptionsParser.main(JvmOptionsParser.java:108)

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Apr 4, 2023

@jordarlu the docker distribution requires at least 4 GB for osd on the server as well as setting vm max map to vm.max_map_count=262144

@jordarlu
Copy link
Contributor

jordarlu commented Apr 4, 2023

Thanks, @peterzhuamazon ..this issue was forwarded to our team back on Jan, so, I tried to reproduce the issue by haviing inefficient mem as the case described, also, attempted to get the idea from the folks here how we want to proceed the next.. :) ..
The intention as I understand is to provide helpful infomation when there is a mem issue using 'docker-compose' ( in my case, even using the 'docker' ) . myabe we can just output 'docker logs' ?

@jordarlu
Copy link
Contributor

Hi, @pszabop , the docker-compose provides an argument --abort-on-container-exit which gives the user more userful info when starting up the container... for example the following ( I use an EC2 with insufficient MEM purposely to be fail in starting up the OpenSearch container as you mentioned in your case description ) :

ubuntu@ip-172-31-20-140:~$ sudo docker-compose -f docker-compose.yml up --abort-on-container-exit
Starting opensearch-node2      ... done
Starting opensearch-dashboards ... done
Starting opensearch-node1      ... done
Attaching to opensearch-node2, opensearch-dashboards, opensearch-node1
opensearch-node1         | Disabling execution of install_demo_configuration.sh for OpenSearch Security Plugin
opensearch-node1         | Disabling OpenSearch Security Plugin
opensearch-node2         | Disabling execution of install_demo_configuration.sh for OpenSearch Security Plugin
opensearch-node2         | Disabling OpenSearch Security Plugin
opensearch-node1         | Enabling execution of OPENSEARCH_HOME/bin/opensearch-performance-analyzer/performance-analyzer-agent-cli for OpenSearch Performance Analyzer Plugin
opensearch-node2         | Enabling execution of OPENSEARCH_HOME/bin/opensearch-performance-analyzer/performance-analyzer-agent-cli for OpenSearch Performance Analyzer Plugin
opensearch-dashboards    | {"type":"log","@timestamp":"2023-05-11T00:39:28Z","tags":["info","plugins-service"],"pid":453,"message":"Plugin \"dataSourceManagement\" has been disabled since the following direct or transitive dependencies are missing or disabled: [dataSource]"}
opensearch-dashboards    | {"type":"log","@timestamp":"2023-05-11T00:39:28Z","tags":["info","plugins-service"],"pid":453,"message":"Plugin \"dataSource\" is disabled."}
opensearch-dashboards    | {"type":"log","@timestamp":"2023-05-11T00:39:28Z","tags":["info","plugins-service"],"pid":453,"message":"Plugin \"visTypeXy\" is disabled."}
opensearch-dashboards    | {"type":"log","@timestamp":"2023-05-11T00:39:28Z","tags":["info","plugins-service"],"pid":453,"message":"Plugin \"mlCommonsDashboards\" is disabled."}
opensearch-dashboards    | {"type":"log","@timestamp":"2023-05-11T00:39:28Z","tags":["warning","config","deprecation"],"pid":453,"message":"\"cpu.cgroup.path.override\" is deprecated and has been replaced by \"ops.cGroupOverrides.cpuPath\""}
opensearch-dashboards    | {"type":"log","@timestamp":"2023-05-11T00:39:28Z","tags":["warning","config","deprecation"],"pid":453,"message":"\"cpuacct.cgroup.path.override\" is deprecated and has been replaced by \"ops.cGroupOverrides.cpuAcctPath\""}
opensearch-dashboards    | {"type":"log","@timestamp":"2023-05-11T00:39:28Z","tags":["warning","config","deprecation"],"pid":453,"message":"\"opensearch.requestHeadersWhitelist\" is deprecated and has been replaced by \"opensearch.requestHeadersAllowlist\""}
opensearch-dashboards    | {"type":"log","@timestamp":"2023-05-11T00:39:29Z","tags":["info","plugins-system"],"pid":453,"message":"Setting up [48] plugins: [alertingDashboards,usageCollection,opensearchDashboardsUsageCollection,opensearchDashboardsLegacy,mapsLegacy,share,expressions,data,securityAnalyticsDashboards,home,console,apmOss,management,indexPatternManagement,advancedSettings,savedObjects,searchRelevanceDashboards,anomalyDetectionDashboards,queryWorkbenchDashboards,notificationsDashboards,indexManagementDashboards,opensearchUiShared,reportsDashboards,embeddable,dashboard,visualizations,visTypeVega,visTypeTimeline,timeline,visTypeTable,visTypeMarkdown,visBuilder,tileMap,regionMap,customImportMapDashboards,inputControlVis,ganttChartDashboards,visualize,legacyExport,bfetch,charts,visTypeTagcloud,visTypeVislib,visTypeTimeseries,visTypeMetric,observabilityDashboards,discover,savedObjectsManagement]"}
opensearch-node2         | ./opensearch-docker-entrypoint.sh: line 70:    11 Killed                  "$@" "${opensearch_opts[@]}"
opensearch-node1         | ./opensearch-docker-entrypoint.sh: line 70:    10 Killed                  "$@" "${opensearch_opts[@]}"
opensearch-node2 exited with code 137
Aborting on container exit...
Stopping opensearch-dashboards ... done
ubuntu@ip-172-31-20-140:~$ 

Although it did not specifically mention insufficient MEM or space issue, but it did provide the exit code 137 which we can look it up and link to possible OOM issue (https://stackoverflow.com/questions/59296801/docker-compose-exit-code-is-137-when-there-is-no-oom-exception)

Could you give it a try and maybe this provide an easier way for troubleshooting than digging into 'docker logs' ?
thanks,

@dblock @peterzhuamazon

@peterzhuamazon
Copy link
Member

Is this error the root cause?

failed; error='Not enough space' (errno=12)

@jordarlu
Copy link
Contributor

I believe so, @peterzhuamazon .. and by viewing the entire line of it "OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000c0000000, 1073741824, 0) failed; error='Not enough space' (errno=12)", I guess it means 'not enough space in memory'.
it is also corresponding with what @pszabop did in this case by using a T2micro for Opensearch cluster , in order to trigger the memory error issue purposely.

@rishabh6788
Copy link
Collaborator

@pszabop Hope you were able to fix the issue, if not please check out the solutions and methods mentioned in the thread above.
Closing for now, please reopen if you are still facing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants