-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another nail to mountinfo coffin #2599
Another nail to mountinfo coffin #2599
Conversation
529dd48
to
aaba1e9
Compare
aaba1e9
to
3c4c97d
Compare
3c4c97d
to
47563e3
Compare
Rebased; @AkihiroSuda @mrunalp PTAL |
498e533
to
0df9654
Compare
@AkihiroSuda @mrunalp PTAL |
@cyphar PTAL |
0df9654
to
a7e7750
Compare
Rebased on top of #2666 just in case |
@cyphar WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping @cyphar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a suggestion (let me know what you think)
libcontainer/cgroups/v1_utils.go
Outdated
@@ -91,6 +95,16 @@ func tryDefaultPath(cgroupPath, subsystem string) string { | |||
return path | |||
} | |||
|
|||
func ReadCgroupMountinfo() ([]*mountinfo.Info, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func ReadCgroupMountinfo() ([]*mountinfo.Info, error) { | |
// ReadCgroupMountinfo returns a list of cgroup mounts for the current running | |
// process. ReadCgroupMountinfo caches the results in memory, and assumes that | |
// no new cgroup mounts appear after it was first called. | |
func ReadCgroupMountinfo() ([]*mountinfo.Info, error) { |
(perhaps "a list of cgroup v1 mounts"?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually; the name ReadCgroupMountinfo
is now a bit ambiguous; Read....
would indicate it actually reads (not caches) mountinfo
.
Looking a bit further; instead of introducing a new, exported, function;
- This same file has a
getCgroupMountsV1()
, which is called byGetCgroupMounts()
getCgroupMountsV1()
performs additional conversion (mappingmountinfo.Info
to alibcontainer/cgroups.Mount
usinggetCgroupMountsHelper()
)- this conversion will be done every time
GetCgroupMounts()
is called
- this conversion will be done every time
- side-note: I see `` and
getCgroupMountsV1()
have an `all` argument, that seems to be always `false` (perhaps it's no longer needed) - All current uses of
ReadCgroupMountinfo()
outside of this package (libcontainer/cgroups/fs/cpuset.go
andlibcontainer/cgroups/fs/fs.go
are only interested inMountpoint
andRoot
, both of which are also preserved inlibcontainer/cgroups.Mount
(returned bygetCgroupMountsV1 ()
Perhaps:
- un-export
ReadCgroupMountinfo()
- remove the
sync.Once()
fromReadCgroupMountinfo()
- export
getCgroupMountsV1
(perhaps rename toGetCgroupV1Mounts()
- This makes it a nice parallel to
GetCgroupMounts()
(but only returningv1
mounts, and thus clearer on what it does)
- This makes it a nice parallel to
- remove the
all
argument (if it's indeed no longer needed) - add the
sync.Once
toGetCgroupV1Mounts()
, so that the conversions are only done once (if no new cgroups are expected to be added, there should be no need to perform the conversion each time either?)
This also makes the interface somewhat cleaner, as we're not returning an "external" type (mountinfo.Info
, but instead a type that's defined in this package (cgroups.Mount
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all
was introduced by Add flag to allow getting all mounts for cgroups subsystems #1049, and it very vividly demonstrates the major problem with runc: it's a tool as well as a set of packages with lots of users. I have changed runc/libcontainer public API in the past and in some cases the users had to modify their code, and in some cases I needed to revert back.
I will look into cadvisor code wrt using all
argument.
-
The idea of reusing GetCgroupMounts (and adding sync.Once to it) looks interesting. The problem is, it's somewhat harder to do with
all
argument as it is now. I tried it in the past, and the result was lots of code. What I like about this PR is it is simple, it removes lots of code making things faster at the same time. -
I don't like yet another exported function either. I think the solution to that is to use
internal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look into cadvisor code wrt using all argument.
google/cadvisor#1461
google/cadvisor#1476
I read all that and thought for some time and I am still not sure how to handle this :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I have added a comment describing ReadCgroupMountinfo
as per your earlier suggestion, and I'll take a look at internalizing it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with doing this API cleanup later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I took a fresh look and found a way to unexport ReadCgroupMountinfo
by not using in the other two commits. The core of this PR stays the same, but the last two commits (getCgroupRoot and cpusets ones) changed a lot. PR description updated with "v2 changes", please re-review @thaJeztah @cyphar
a7e7750
to
0c845b9
Compare
libcontainer/cgroups/v1_utils.go
Outdated
if numFound >= len(ss) { | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this new check necessary? It's also not correct if all
is true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't changed the original logic here, which was
for scanner.Scan() && numFound < len(ss) {
but it's good that you asked, as now I see that this function is not tested with all
set to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK, the all
logic was added by f557996 (done for google/cadvisor#1476) and it correctly avoided incrementing numFound
in case all == true
....
.... until 5ee0648 (PR #1817) broke it (since v1.0.0-rc6 and until now).
I am not sure why e.g. google/cadvisor#2309 (that bumps runc to a version with this bug) haven't reintroduced the google/cadvisor#1461. There is still an open issue that looks very similar to it though: google/cadvisor#1843.
Anyway, I am fixing this in #2689,
and rebasing this PR on top of that one.
Aside from my nit, this LGTM. |
a0e0e62
to
75a9015
Compare
75a9015
to
08caa9b
Compare
CI failing |
08caa9b
to
dc17476
Compare
Hmm, failure on rootless + idmap on Fedora 33:
This is the timeout being hit. Does not look related, and I haven't ever seen this before. Maybe CI is exceptionally slow today? :) update: #2692 |
CI seems to have got stuck, could you push to retrigger CI? |
Yes, basically all old PRs (that don't have #2690) needs to be rebased :-\ Alternatively, I can unmark those checks as required. |
dc17476
to
2c1eacd
Compare
Rebased to include #2690 |
needs rebase again sorry |
Apparently it is inevitable that we have to read mountinfo multiple times when dealing with cgroup v1. It seems we can only do it once and reuse the data, without major modifications to the code. This commit does a few things. 1. Drop our custom mountinfo parser implementation in favor of moby/sys/mountinfo. While the custom parser is faster (about 2x according to benchmark) for this particular case, the one from the package is more correct and future-proof. 2. Read mountinfo only once, caching all the entries with fstype of cgroup. With this, there's no need to worry about performance degradation introduced above. 3. Drop "isSubsystemAvailable" optimization (introduced by commit 2a1a6cd) because now with the cache it is probably slowing things down. 4. Modify the tests accordingly. Signed-off-by: Kir Kolyshkin <[email protected]>
Drop the custom mountinfo parser, reuse the cgroups.GetCgroupMounts() to get the cgroup root. Faster, and much less code. Signed-off-by: Kir Kolyshkin <[email protected]>
In cgroup v1, the parent of a cgroup mount is on tmpfs (or maybe any other fs but definitely not cgroupfs). Use this to traverse up the tree until we reach non-cgroupfs. This should work in any setups (nested container, non-standard cgroup mounts) so issue [1] won't happen. Theoretically, the only problematic case is when someone mounts cpuset cgroupfs into a directory on another cgroupfs. Let's assume people don't do that -- if they do, they will get other error (e.g. inability to read cpuset.cpus file). [1] opencontainers#1367 Signed-off-by: Kir Kolyshkin <[email protected]>
Here, we always create a parent directory, so using MkdirAll is redundant. Use Mkdir instead. One difference between MkdirAll and Mkdir is the former ignores EEXIST, and since we sometimes try to create a directory that already exists, we need to explicitly ignore that. Signed-off-by: Kir Kolyshkin <[email protected]>
2c1eacd
to
9f2153c
Compare
rebased (for new CI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@cyphar still LGTY? |
@cyphar PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Kir!
This is a set of patches for libcontainer/cgroups that
moby/sys/mountinfo
, another to reusecgroups/utils
);cgroups/fs/cpuset
code.The idea is roughly the same as #2265 but the implementation is way simpler.
The end result is smaller code (
98 insertions(+), 147 deletions(-)
), less bugs (I hope) and overall higher performance.Tested it with strace shows that the number of times
runc run
opens/proc/self/mountinfo
has decreased from 19 to 6 times (note it was 108 half a year ago). Of which 5 times must be out of cgroups code (haven't checked), so it's 14 -> 1 improvement. In absolute terms, it can be up to 0.023*14 ~= 0.3 seconds savings on a container start (on a system very busy with mounts).v2 changes:
readCgroupMountinfo
(by removing its two external users, see below);cgroups.GetCgroupMounts
;statfs
);NOTE this currently depends on (and includes) #2689 -- will rebase once that one is merged.