-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce CPU usage running large numbers of clients #5951
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the right approach but I have few suggestions.
It might be good to rely on memory/cpu cgroup to get aggregate usage from kernel directly, rather than get all pids and then query and aggregate for each pid usage in user space; but that can be done in a follow up PR.
configureBasicCgroups(cfg) | ||
err := configureBasicCgroups(cfg) | ||
if err != nil { | ||
e.logger.Debug("cgroup create", "error", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a warning or an error - maybe something like. Also would be nice to wrap the error.
e.logger.Debug("cgroup create", "error", err) | |
e.logger.Warn("failed to create cgroup", "error", err) | |
return fmt.Errorf("failed to create cgroups: %v", err) |
drivers/shared/executor/executor.go
Outdated
if e.resConCtx.isEmpty() { | ||
go e.pidCollector.collectPids(e.processExited, getAllPids) | ||
} else { | ||
go e.pidCollector.collectPids(e.processExited, e.getAllPids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reusing the same function name to refer to a executor method as well as a package function with different implementations make it a bit harder to follow code. Apparently, we do that already with libcontainer executor :(.
Also, l.getAllPids
and resConCtx.isEmpty
are only defined on linux, so compilation fails on Windows and macOS, e.g. https://ci.appveyor.com/project/hashicorp/nomad/builds/25919504 . See the pattern of executor_universal_linux.go
, executor_windows.go
, and executor_unix.go
.
I would suggest refactoring this to have a universal executor getAllPids
method that calls helpers to look up pids (e.g. getPidsByCgroup
vs getPidsByScanningProcesses
)?
return rc.groups == nil | ||
} | ||
|
||
func (e *UniversalExecutor) getAllPids() (map[int]*nomadPid, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move method to executor_universal_linux.go
file as it's a UniversalExecutor linux method.
paths = rc.groups.Paths | ||
} else { | ||
paths[""] = rc.groups.Path | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cgroups, we want to use a single cgroup hierarchy. Checking multiple hierarchy would lead to duplicate values in the case of multiple cgroups attached.
In Nomad, we always create the freezer
cgroup only in configureBasicCgroups
[1], so that's an appropriate one to use. For comparison, docker/runc
requires devices
cgroup to be mounted, so they only check the devices
subsystem[2]
[1]
nomad/drivers/shared/executor/executor_linux.go
Lines 715 to 728 in 88c03d0
// Manually create freezer cgroup | |
cfg.Cgroups.Paths = map[string]string{} | |
root, err := cgroups.FindCgroupMountpointDir() | |
if err != nil { | |
return err | |
} | |
if _, err := os.Stat(root); err != nil { | |
return err | |
} | |
freezer := cgroupFs.FreezerGroup{} | |
subsystem := freezer.Name() | |
path, err := cgroups.FindCgroupMountpoint("", subsystem) |
[2] https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/fs/apply_raw.go#L286-L289
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in configureBasicCgroups
, we make an empty Freezer
and ask it it's name, which constantly returns "freezer". Should I do that here? or use just the string over there?
dba750e
to
ab3e625
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If theres a good way to get a test for resourceContainerContext.getAllPidsByCgroup
that would be nice. That can be looked into post merge.
I agree with @notnoop that there may be more efficient means to get some of these stats if the mem/cpu cgroups are being utilized but that can be filed and done later too.
Nice work!
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
raw_exec uses the universal executor, not the executor linux. on linux, this still executes the job in a cgroup, which gives us a handle to use the cgroup process list rather than scanning all processes on the client machine.
fixes Reduce CPU utilization in executors when scanning for PIDs. #5832