Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling memory contention and OOM Killer #10414

Open
notnoop opened this issue Apr 20, 2021 · 0 comments
Open

Handling memory contention and OOM Killer #10414

notnoop opened this issue Apr 20, 2021 · 0 comments
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/restart/reschedule theme/scheduling type/enhancement

Comments

@notnoop
Copy link
Contributor

notnoop commented Apr 20, 2021

When a host is running with contended memory, nomad needs to take extra care to avoid exacerbating the situation. If a workload is OOMed due to contended memory, it should be rescheduled aggressively rather than be restarted. Restarting kill OOM-killed tasks may cause further memory contention and further OOM activity.

Memory contention can arise due to the memory oversubscription feature introduced in #10247. It's also possible that host system services that aren't manage by Nomad may spike their memory usage beyond the configured reserved memory flag.
Memory contention may occur thorough

Nomad must distinguish between tasks that exceed their memory limit and are OOMed from bystander tasks that are killed because they were chosen as a victim in an oversubscribed host.

@notnoop notnoop added type/enhancement theme/scheduling theme/restart/reschedule stage/accepted Confirmed, and intend to work on. No timeline committment though. labels Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/restart/reschedule theme/scheduling type/enhancement
Projects
None yet
Development

No branches or pull requests

1 participant