Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/RFC: libcontainer/cgroups: add mountinfo cache #2265

Closed
wants to merge 1 commit into from

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Mar 21, 2020

There are a few public functions in libcontainer/cgroups
that end up parsing /proc/self/mountinfo file on every call.

For example, a simple runc run call results in from 65
(with --systemd-cgroup) to 108 (without systemd cgroups)
to FindCgroupMountpointAndRoot.

There are a few other public functions that rely on
FindCgroupMountpointAndRoot and therefore parsing of
/proc/self/mountinfo:

  • FindCgroupMountpoint
  • GetOwnCgroupPath
  • GetInitCgroupPath

Ideally, we should only parse mountinfo once, collecting all the
information needed, and then use it.

Realistically, since these functions are public, and some are
also used outside of runc, changing their usage patterns or result
is out of the question. We have to stay compatible.

Let's introduce a semi-lazy mountinfo cache, which is per-prefix
and per-subsystem. Every time a new prefix (i.e. the cgroupPath
argument to FindCgroupMountpointAndRoot) is used, we add a new
cache entry and parse mountinfo once for all known subsystems,
populating the cache.

This reduces amount of openat(AT_FDCWD, "/proc/self/mountinfo", ...
lines in strace output from 116 to 10 (I think now only two of them
are from FindCgroupMountpointAndRoot).

All of the above is for cgroup v1.

I have profiled the FindCgroupMountpointAndRoot calls
by adding the following code at the beginning of the function:

	start := time.Now()
	defer func() {
		fmt.Printf("FindCgroupMountpointAndRoot: %11.9f s\n", time.Now().Sub(start).Seconds())
	}()

and executing runc run in two environments:

  • normal (mostly idle) system
  • system running 200 mounts/unmounts in parallel

On mostly idle system, before this patch:

FindCgroupMountpointAndRoot: 0.000173743 s
FindCgroupMountpointAndRoot: 0.000098944 s
FindCgroupMountpointAndRoot: 0.000129914 s
FindCgroupMountpointAndRoot: 0.000089391 s
FindCgroupMountpointAndRoot: 0.000108485 s
FindCgroupMountpointAndRoot: 0.000094416 s
FindCgroupMountpointAndRoot: 0.000063170 s
FindCgroupMountpointAndRoot: 0.000058595 s
...

After this patch:

FindCgroupMountpointAndRoot: 0.000133898 s
FindCgroupMountpointAndRoot: 0.000129865 s
FindCgroupMountpointAndRoot: 0.000000324 s
FindCgroupMountpointAndRoot: 0.000000171 s
FindCgroupMountpointAndRoot: 0.000001616 s
FindCgroupMountpointAndRoot: 0.000000552 s
FindCgroupMountpointAndRoot: 0.000000164 s
FindCgroupMountpointAndRoot: 0.000000180 s
FindCgroupMountpointAndRoot: 0.000000359 s
FindCgroupMountpointAndRoot: 0.000000361 s
FindCgroupMountpointAndRoot: 0.000000151 s
...

Summary: the cache gives roughly 100x speed increase,
or, in absolute time, saves up to 0.01 seconds per runc run
(about half of that for systemd case).

On a system busy with mounts, before this patch:

FindCgroupMountpointAndRoot: 0.027318287 s
FindCgroupMountpointAndRoot: 0.011800278 s
FindCgroupMountpointAndRoot: 0.003406432 s
FindCgroupMountpointAndRoot: 0.007535358 s
FindCgroupMountpointAndRoot: 0.008702834 s
FindCgroupMountpointAndRoot: 0.023832925 s
FindCgroupMountpointAndRoot: 0.004116992 s
FindCgroupMountpointAndRoot: 0.002849313 s
FindCgroupMountpointAndRoot: 0.002142763 s
FindCgroupMountpointAndRoot: 0.003587968 s
FindCgroupMountpointAndRoot: 0.012451438 s
FindCgroupMountpointAndRoot: 0.010582860 s
FindCgroupMountpointAndRoot: 0.018709190 s
...

After this patch:

FindCgroupMountpointAndRoot: 0.009186349 s
FindCgroupMountpointAndRoot: 0.010484656 s
FindCgroupMountpointAndRoot: 0.000000681 s
FindCgroupMountpointAndRoot: 0.000000805 s
FindCgroupMountpointAndRoot: 0.000001982 s
FindCgroupMountpointAndRoot: 0.000000767 s
FindCgroupMountpointAndRoot: 0.000000352 s
FindCgroupMountpointAndRoot: 0.000000319 s
FindCgroupMountpointAndRoot: 0.000000558 s
FindCgroupMountpointAndRoot: 0.000000752 s
FindCgroupMountpointAndRoot: 0.000000233 s
...

Summary: the cache gives roughly 1000x improvement over reading
the mountinfo, or, in absolute time, saves up to 1 second per
runc runc (or up to 0.6 seconds in systemd case).

Signed-off-by: Kir Kolyshkin [email protected]

There are a lot of public functions that end up parsing
/proc/self/mountinfo file over and over.

For example, a simple `runc run` call results in from 65
(with --systemd-cgroup) to 108 (without systemd cgroups)
to FindCgroupMountpointAndRoot.

There are a few other public functions that rely on
FindCgroupMountpointAndRoot and therefore parsing of
/proc/self/mountinfo:

 * FindCgroupMountpoint
 * GetOwnCgroupPath
 * GetInitCgroupPath

Ideally, we should only parse mountinfo once, collecting all the
information needed, and then use it.

Realistically, since these functions are public, and some are
also used outside of runc, changing their usage patterns or result
is out of the question. We have to stay compatible.

Let's introduce a semi-lazy mountinfo cache, which is per-prefix
and per-subsystem. Every time a new prefix (i.e. the cgroupPath
argument to FindCgroupMountpointAndRoot) is used, we add a new
cache entry and parse mountinfo once for all known subsystems,
populating the cache.

This reduces amount of `openat(AT_FDCWD, "/proc/self/mountinfo", ...`
lines in strace output from 116 to 10 (I think now only two of them
are from FindCgroupMountpointAndRoot).

All of the above is for cgroup v1.

I have profiled the FindCgroupMountpointAndRoot calls
by adding the following code at the beginning of the function:

```go
	start := time.Now()
	defer func() {
		fmt.Printf("FindCgroupMountpointAndRoot: %11.9f s\n", time.Now().Sub(start).Seconds())
	}()
```

and executing `runc run` in two environments:

 * normal (mostly idle) system
 * system running 200 mounts/unmounts in parallel

On mostly idle system, before this patch:

> FindCgroupMountpointAndRoot: 0.000173743 s
> FindCgroupMountpointAndRoot: 0.000098944 s
> FindCgroupMountpointAndRoot: 0.000129914 s
> FindCgroupMountpointAndRoot: 0.000089391 s
> FindCgroupMountpointAndRoot: 0.000108485 s
> FindCgroupMountpointAndRoot: 0.000094416 s
> FindCgroupMountpointAndRoot: 0.000063170 s
> FindCgroupMountpointAndRoot: 0.000058595 s
> ...

After this patch:

> FindCgroupMountpointAndRoot: 0.000133898 s
> FindCgroupMountpointAndRoot: 0.000129865 s
> FindCgroupMountpointAndRoot: 0.000000324 s
> FindCgroupMountpointAndRoot: 0.000000171 s
> FindCgroupMountpointAndRoot: 0.000001616 s
> FindCgroupMountpointAndRoot: 0.000000552 s
> FindCgroupMountpointAndRoot: 0.000000164 s
> FindCgroupMountpointAndRoot: 0.000000180 s
> FindCgroupMountpointAndRoot: 0.000000359 s
> FindCgroupMountpointAndRoot: 0.000000361 s
> FindCgroupMountpointAndRoot: 0.000000151 s
> ...

Summary: the cache gives roughly 100x speed increase,
or, in absolute time, saves up to 0.01 seconds per `runc run`
(about half of that for systemd case).

On a system busy with mounts, before this patch:

> FindCgroupMountpointAndRoot: 0.027318287 s
> FindCgroupMountpointAndRoot: 0.011800278 s
> FindCgroupMountpointAndRoot: 0.003406432 s
> FindCgroupMountpointAndRoot: 0.007535358 s
> FindCgroupMountpointAndRoot: 0.008702834 s
> FindCgroupMountpointAndRoot: 0.023832925 s
> FindCgroupMountpointAndRoot: 0.004116992 s
> FindCgroupMountpointAndRoot: 0.002849313 s
> FindCgroupMountpointAndRoot: 0.002142763 s
> FindCgroupMountpointAndRoot: 0.003587968 s
> FindCgroupMountpointAndRoot: 0.012451438 s
> FindCgroupMountpointAndRoot: 0.010582860 s
> FindCgroupMountpointAndRoot: 0.018709190 s
> ...

After this patch:
> FindCgroupMountpointAndRoot: 0.009186349 s
> FindCgroupMountpointAndRoot: 0.010484656 s
> FindCgroupMountpointAndRoot: 0.000000681 s
> FindCgroupMountpointAndRoot: 0.000000805 s
> FindCgroupMountpointAndRoot: 0.000001982 s
> FindCgroupMountpointAndRoot: 0.000000767 s
> FindCgroupMountpointAndRoot: 0.000000352 s
> FindCgroupMountpointAndRoot: 0.000000319 s
> FindCgroupMountpointAndRoot: 0.000000558 s
> FindCgroupMountpointAndRoot: 0.000000752 s
> FindCgroupMountpointAndRoot: 0.000000233 s
> ...

Summary: the cache gives roughly 1000x improvement over reading
the mountinfo, or, in absolute time, saves up to 1 second per
`runc runc` (or up to 0.6 seconds in systemd case).

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Contributor Author

I have optimized the mounts parsing itself, too, a bit, but ideally I'd like to reuse moby/sys/mountinfo as well (once #2256 is merged)

@rphillips
Copy link

On a busy cloud server this will have huge - positive - impacts on IOPS as well.

@rphillips
Copy link

Is there somewhere that removes an entry in the cache?

@kolyshkin
Copy link
Contributor Author

Is there somewhere that removes an entry in the cache?

No.

First, it's only ~20 entries in most of the cases, and maybe 100 entries in some exotic cgroups setups like in Bedrock Linux (#1817) -- provided someone will look up entries with these prefixes.

If you are asking about cache invalidation, I was thinking about it but these entries are pretty static, meaning no one ever unmounts those or moves those mounts to a different directory. In case there are such cases, it's easy to add code to validate if a cache entry is still valid and fall back to parsing mountinfo, but it will complicate things and slow them down a bit.

@kolyshkin
Copy link
Contributor Author

This is not really a WIP anymore, except I'm blocked by #2256. Once merged, I will rebase this one on top of it.

@kolyshkin
Copy link
Contributor Author

Also, I do realize that the current code will have two sets of entries in the cache -- for empty ("") prefix and for "/sys/fs/cgroup" prefix, and those two sets are identical. This is not ideal but OK from my POV, not worth optimizing out.

}

func findCgroupMountpointAndRootFromReader(reader io.Reader, cgroupPath, subsystem string) (string, string, error) {
func findCgroupMountpointsForPrefix(reader io.Reader, cgroupPath string, subs map[string]string) (cgMountCacheForPrefix, error) {
Copy link
Member

@AkihiroSuda AkihiroSuda Mar 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs UT for v2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I still haven't eliminated all its uses from the v2 code, so I'll add it later.

@kolyshkin
Copy link
Contributor Author

Closing in favor of #2438, in which I am finaly finding ways to reduce the number of times we parse mountinfo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants