-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP/RFC: libcontainer/cgroups: add mountinfo cache #2265
Conversation
There are a lot of public functions that end up parsing /proc/self/mountinfo file over and over. For example, a simple `runc run` call results in from 65 (with --systemd-cgroup) to 108 (without systemd cgroups) to FindCgroupMountpointAndRoot. There are a few other public functions that rely on FindCgroupMountpointAndRoot and therefore parsing of /proc/self/mountinfo: * FindCgroupMountpoint * GetOwnCgroupPath * GetInitCgroupPath Ideally, we should only parse mountinfo once, collecting all the information needed, and then use it. Realistically, since these functions are public, and some are also used outside of runc, changing their usage patterns or result is out of the question. We have to stay compatible. Let's introduce a semi-lazy mountinfo cache, which is per-prefix and per-subsystem. Every time a new prefix (i.e. the cgroupPath argument to FindCgroupMountpointAndRoot) is used, we add a new cache entry and parse mountinfo once for all known subsystems, populating the cache. This reduces amount of `openat(AT_FDCWD, "/proc/self/mountinfo", ...` lines in strace output from 116 to 10 (I think now only two of them are from FindCgroupMountpointAndRoot). All of the above is for cgroup v1. I have profiled the FindCgroupMountpointAndRoot calls by adding the following code at the beginning of the function: ```go start := time.Now() defer func() { fmt.Printf("FindCgroupMountpointAndRoot: %11.9f s\n", time.Now().Sub(start).Seconds()) }() ``` and executing `runc run` in two environments: * normal (mostly idle) system * system running 200 mounts/unmounts in parallel On mostly idle system, before this patch: > FindCgroupMountpointAndRoot: 0.000173743 s > FindCgroupMountpointAndRoot: 0.000098944 s > FindCgroupMountpointAndRoot: 0.000129914 s > FindCgroupMountpointAndRoot: 0.000089391 s > FindCgroupMountpointAndRoot: 0.000108485 s > FindCgroupMountpointAndRoot: 0.000094416 s > FindCgroupMountpointAndRoot: 0.000063170 s > FindCgroupMountpointAndRoot: 0.000058595 s > ... After this patch: > FindCgroupMountpointAndRoot: 0.000133898 s > FindCgroupMountpointAndRoot: 0.000129865 s > FindCgroupMountpointAndRoot: 0.000000324 s > FindCgroupMountpointAndRoot: 0.000000171 s > FindCgroupMountpointAndRoot: 0.000001616 s > FindCgroupMountpointAndRoot: 0.000000552 s > FindCgroupMountpointAndRoot: 0.000000164 s > FindCgroupMountpointAndRoot: 0.000000180 s > FindCgroupMountpointAndRoot: 0.000000359 s > FindCgroupMountpointAndRoot: 0.000000361 s > FindCgroupMountpointAndRoot: 0.000000151 s > ... Summary: the cache gives roughly 100x speed increase, or, in absolute time, saves up to 0.01 seconds per `runc run` (about half of that for systemd case). On a system busy with mounts, before this patch: > FindCgroupMountpointAndRoot: 0.027318287 s > FindCgroupMountpointAndRoot: 0.011800278 s > FindCgroupMountpointAndRoot: 0.003406432 s > FindCgroupMountpointAndRoot: 0.007535358 s > FindCgroupMountpointAndRoot: 0.008702834 s > FindCgroupMountpointAndRoot: 0.023832925 s > FindCgroupMountpointAndRoot: 0.004116992 s > FindCgroupMountpointAndRoot: 0.002849313 s > FindCgroupMountpointAndRoot: 0.002142763 s > FindCgroupMountpointAndRoot: 0.003587968 s > FindCgroupMountpointAndRoot: 0.012451438 s > FindCgroupMountpointAndRoot: 0.010582860 s > FindCgroupMountpointAndRoot: 0.018709190 s > ... After this patch: > FindCgroupMountpointAndRoot: 0.009186349 s > FindCgroupMountpointAndRoot: 0.010484656 s > FindCgroupMountpointAndRoot: 0.000000681 s > FindCgroupMountpointAndRoot: 0.000000805 s > FindCgroupMountpointAndRoot: 0.000001982 s > FindCgroupMountpointAndRoot: 0.000000767 s > FindCgroupMountpointAndRoot: 0.000000352 s > FindCgroupMountpointAndRoot: 0.000000319 s > FindCgroupMountpointAndRoot: 0.000000558 s > FindCgroupMountpointAndRoot: 0.000000752 s > FindCgroupMountpointAndRoot: 0.000000233 s > ... Summary: the cache gives roughly 1000x improvement over reading the mountinfo, or, in absolute time, saves up to 1 second per `runc runc` (or up to 0.6 seconds in systemd case). Signed-off-by: Kir Kolyshkin <[email protected]>
I have optimized the mounts parsing itself, too, a bit, but ideally I'd like to reuse moby/sys/mountinfo as well (once #2256 is merged) |
On a busy cloud server this will have huge - positive - impacts on IOPS as well. |
Is there somewhere that removes an entry in the cache? |
No. First, it's only ~20 entries in most of the cases, and maybe 100 entries in some exotic cgroups setups like in Bedrock Linux (#1817) -- provided someone will look up entries with these prefixes. If you are asking about cache invalidation, I was thinking about it but these entries are pretty static, meaning no one ever unmounts those or moves those mounts to a different directory. In case there are such cases, it's easy to add code to validate if a cache entry is still valid and fall back to parsing mountinfo, but it will complicate things and slow them down a bit. |
This is not really a WIP anymore, except I'm blocked by #2256. Once merged, I will rebase this one on top of it. |
Also, I do realize that the current code will have two sets of entries in the cache -- for empty ("") prefix and for "/sys/fs/cgroup" prefix, and those two sets are identical. This is not ideal but OK from my POV, not worth optimizing out. |
} | ||
|
||
func findCgroupMountpointAndRootFromReader(reader io.Reader, cgroupPath, subsystem string) (string, string, error) { | ||
func findCgroupMountpointsForPrefix(reader io.Reader, cgroupPath string, subs map[string]string) (cgMountCacheForPrefix, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs UT for v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I still haven't eliminated all its uses from the v2 code, so I'll add it later.
Closing in favor of #2438, in which I am finaly finding ways to reduce the number of times we parse mountinfo. |
There are a few public functions in libcontainer/cgroups
that end up parsing
/proc/self/mountinfo
file on every call.For example, a simple
runc run
call results in from 65(with
--systemd-cgroup
) to 108 (without systemd cgroups)to
FindCgroupMountpointAndRoot
.There are a few other public functions that rely on
FindCgroupMountpointAndRoot
and therefore parsing of/proc/self/mountinfo
:FindCgroupMountpoint
GetOwnCgroupPath
GetInitCgroupPath
Ideally, we should only parse mountinfo once, collecting all the
information needed, and then use it.
Realistically, since these functions are public, and some are
also used outside of runc, changing their usage patterns or result
is out of the question. We have to stay compatible.
Let's introduce a semi-lazy mountinfo cache, which is per-prefix
and per-subsystem. Every time a new prefix (i.e. the cgroupPath
argument to
FindCgroupMountpointAndRoot
) is used, we add a newcache entry and parse mountinfo once for all known subsystems,
populating the cache.
This reduces amount of
openat(AT_FDCWD, "/proc/self/mountinfo", ...
lines in strace output from 116 to 10 (I think now only two of them
are from FindCgroupMountpointAndRoot).
All of the above is for cgroup v1.
I have profiled the FindCgroupMountpointAndRoot calls
by adding the following code at the beginning of the function:
and executing
runc run
in two environments:On mostly idle system, before this patch:
After this patch:
Summary: the cache gives roughly 100x speed increase,
or, in absolute time, saves up to 0.01 seconds per
runc run
(about half of that for systemd case).
On a system busy with mounts, before this patch:
After this patch:
Summary: the cache gives roughly 1000x improvement over reading
the mountinfo, or, in absolute time, saves up to 1 second per
runc runc
(or up to 0.6 seconds in systemd case).Signed-off-by: Kir Kolyshkin [email protected]