-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exec/Java: Default OOMScoreAdjust causes memory limit to be ignored #10663
Comments
Hi @MikeN123 ! Thanks for raising it. This is indeed a bad behavior and should be fixed. We'll aim to ship a fix soon. |
Thanks! I also noted that the PR for memory oversubscription said something on |
Explicitly set the `oom_score_adj` value for `exec` and `java` tasks. We recommend that the Nomad service to have oom_score_adj of a low value (e.g. -1000) to avoid having nomad agent OOM Killed if the node is oversubscriped. However, Nomad's workloads should not inherit Nomad's process, which is the default behavior. Fixes #10663
Explicitly set the `oom_score_adj` value for `exec` and `java` tasks. We recommend that the Nomad service to have oom_score_adj of a low value (e.g. -1000) to avoid having nomad agent OOM Killed if the node is oversubscriped. However, Nomad's workloads should not inherit Nomad's process, which is the default behavior. Fixes #10663
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v1.1.0 (2678c3604bc9530014208bc167415e167fd440fc)
Operating system and Environment details
Debian Linux 10
Issue
The default setting of
OOMScoreAdjust=-1000
in the systemd unit file makes sure Nomad will never be killed by the OOM killer. But, this setting in inherited by child executors and processes, causing the cgroup memory limit not to be enforced. Instead, the OOM killer will log error messages to syslog like crazy, filling up the disk.Reproduction steps
Expected Result
OOM killer should kick in and kill the process.
I believe the drivers should reset the
OOMScoreAdjust
for child processes to some sensible value. The only other workaround (at least for the java driver) is to not set anOOMScoreAdjust
on the nomad process, but that has the disadvantage that nomad may be killed if the host experiences memory pressure.Actual Result
the task is not killed, as the OOM killer is unable to kill any process due to the
OOMScoreAdjust
.This was also noted in the issue where the default was suggested, but apparently never picked up: #6672 (comment)
The message logged to syslog looks like this:
The text was updated successfully, but these errors were encountered: