-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config-linux: add more info about hugetlb page size #1011
Conversation
config-linux.md
Outdated
@@ -394,6 +394,8 @@ For more information, see the kernel cgroups documentation about [HugeTLB][cgrou | |||
Each entry has the following structure: | |||
|
|||
* **`pageSize`** *(string, REQUIRED)* - hugepage size | |||
The value has the format "<size><unit-prefix>B" (64KB, 2MB, 1GB), and must match the <hugepagesize> of the | |||
corresponding control file found in "/sys/fs/cgroup/hugetlb/hugetlb.<hugepagesize>.limit_in_bytes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also specify explicitly that these are intended to be 1024-based units?
I think https://github.com/opencontainers/runc/blob/e4fa8a457544ca646e02e60d124aebb0bb7f52ad/libcontainer/cgroups/utils.go#L411-L429 is also related -- the kernel appears to use a lowercase k
but uppercase for the rest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tianon! 😄
Should we also specify explicitly that these are intended to be 1024-based units?
Smart! Any suggestion about how to write it?
I think https://github.com/opencontainers/runc/blob/e4fa8a457544ca646e02e60d124aebb0bb7f52ad/libcontainer/cgroups/utils.go#L411-L429 is also related -- the kernel appears to use a lowercase k but uppercase for the rest?
The kernel does indeed use "KB" (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/hugetlb_cgroup.c?h=v5.0#n349), and that has been the case from day 1. To avoid the mistake, I posted a patch adding a note about it in the kernel docs too. https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v1/hugetlb.txt?h=for-5.2-fixes It was applied to the cgroup-tree, and will most likely be a part of the 5.2
release 😄
I have a PR in runc to fix it here: opencontainers/runc#2065, together with this PR kubernetes/kubernetes#78495 in k8s, blocked by runc. 😄 Both PRs have some background info about the issue. Even containerd
use , patched here: containerd/cgroups#85kB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think the lowercase k
is simply a result of copy-paste from docker/go-units into runc, and further into k8s etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is -- when I do an ls
on the path that bit of code is looking at, I get lowercase k
values:
$ ls /sys/kernel/mm/hugepages/
hugepages-1048576kB hugepages-2048kB
Smart! Any suggestion about how to write it?
Technically, what we're talking about is officially "MiB", etc, but still commonly used both ways (which is why I figure it's probably worth calling out, since MB
is still considered ambiguous).
Here's the best idea I've got, maybe someone else can do / adapt this better:
Values of
<unit-prefix>
are intended to be parsed using base 1024 ("1KB" = 1024, "1MB" = 1048576, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is -- when I do an ls on the path that bit of code is looking at, I get lowercase k values:
You are looking at the normal kernel memory section (idk. what they are calling it). There, all the sizes are rendered with kB
(yes, lowercase). The cgroup
control files (v1) we are talking about, are located in /sys/fs/cgroup/hugetlb/
. When setting the huge page limits we use the cgroup control files for the given container (further down in the /sys/fs/cgroup/hugetlb
dir), and not the files inside /sys/kernel/mm/hugepages/
.
I totally agree that this is strange, and that it would be easier to just use the same format both places. Maybe they does something else when they implement huge pages in cgroup v2 (I dont think they have yet)..
Technically, what we're talking about is officially "MiB", etc, but still commonly used both ways (which is why I figure it's probably worth calling out, since MB is still considered ambiguous).
Here's the best idea I've got, maybe someone else can do / adapt this better:
Yee. MiB is the correct. When dealing with memory, the kernel people (and more or less all of us) always use 1 << 10
, 1 << 20
etc. when talking about sizes, and in my opinion they (the kernel people) do a lot of strange and inconsistent stuff when naming these things. They never (almost never) break userspace, so I guess that is the reason stuff like this is kept the way it is.
Values of are intended to be parsed using base 1024 ("1KB" = 1024, "1MB" = 1048576, etc).
Nice! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, thanks for the additional background!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the pageSize
is used to write to the cgroup control file: https://github.com/opencontainers/runc/blob/e4fa8a457544ca646e02e60d124aebb0bb7f52ad/libcontainer/cgroups/fs/hugetlb.go#L31
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the description here according to the discussion now @tianon! 😄
Currently the docs don't say anything about what the "pageSize" is other than the fact that it is a string. This makes it easier for developers to understand how it works, and may help avoiding mistakes which are hard to spot. Signed-off-by: Odin Ugedal <[email protected]>
CC: @crosbymichael |
Currently the docs don't say anything about what the "pageSize" is other
than the fact that it is a string. This makes it easier for developers
to understand how it works, and may help avoiding mistakes which are
hard to spot.
This can help other doing the same mistake as
runc
and k8s when using hugepages, used<size>kB
instead of the correct<size>KB
. (opencontainers/runc#2065)Signed-off-by: Odin Ugedal [email protected]