-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
limit of file descriptors inside a container always is 1024 #2532
Comments
great catch cc @BenTheElder |
This is pretty frustrating, is there a way to get the actual default base spec? |
@AkihiroSuda can you advise here? |
I filed containerd/containerd#6262 to discuss being able to obtain the CRI base spec upstream, in the meantime we should probably carefully inspect where CRI deviates from the OCI base spec and duplicate this behavior, I guess ... |
So currently {
"ociVersion": "1.0.2-dev",
"process": {
"user": {
"uid": 0,
"gid": 0
},
"cwd": "/",
"capabilities": {
"bounding": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
],
"effective": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
],
"inheritable": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
],
"permitted": [
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE"
]
},
"rlimits": [
{
"type": "RLIMIT_NOFILE",
"hard": 1024,
"soft": 1024
}
],
"noNewPrivileges": true
},
"root": {
"path": "rootfs"
},
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
{
"destination": "/dev/pts",
"type": "devpts",
"source": "devpts",
"options": [
"nosuid",
"noexec",
"newinstance",
"ptmxmode=0666",
"mode=0620",
"gid=5"
]
},
{
"destination": "/dev/shm",
"type": "tmpfs",
"source": "shm",
"options": [
"nosuid",
"noexec",
"nodev",
"mode=1777",
"size=65536k"
]
},
{
"destination": "/dev/mqueue",
"type": "mqueue",
"source": "mqueue",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/sys",
"type": "sysfs",
"source": "sysfs",
"options": [
"nosuid",
"noexec",
"nodev",
"ro"
]
},
{
"destination": "/run",
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
}
],
"linux": {
"resources": {
"devices": [
{
"allow": false,
"access": "rwm"
}
]
},
"cgroupsPath": "/default",
"namespaces": [
{
"type": "pid"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
},
{
"type": "network"
}
],
"maskedPaths": [
"/proc/acpi",
"/proc/asound",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/sys/firmware",
"/proc/scsi"
],
"readonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
}
} This doesn't contain selinux or apparmor profiles, we can just delete the rlimits for now and continue discussing long term options in containerd/containerd#6262 |
should be fixed in the latest image listed in v0.11.1 and in kind @ HEAD |
What happened:
Since #2321 is merged (done via #2465) the limit of file descriptors inside a container always is 1024.
What you expected to happen:
The same behavior as before the merge: the limit is inherited from the containerd process.
How to reproduce it (as minimally and precisely as possible):
kind build from branch
v0.11.1
kind build from current
main
Anything else we need to know?:
I assume this is because #2321 introduced a base spec.
The base spec is configured here[0] and generated during the docker image build here[1].
ctr oci spec
generates a default spec, which for unix is defined here[2] and hasRLIMIT_NOFILE
set to 1024.Containerd honors the base spec if one is set[3] and clears the settings if non is set.
So if there is no base spec
WithoutDefaultSecuritySettings
is called which clears the limit[4].If the limit is cleared containerd falls back to the limit from the containerd process[5].
Due to this systemd setting[6] containerd itself fell back to the host systems value.
This was the case before the merge.
But as there now is a base spec we never reach this code path which mean that always the value from the base spec is used (1024).
I have at least one use case were I need a higher value than 1024.
My current workaround is to manitpulate the base spec in the running containers:
I currently don't know what the best way to fix this would be, but I'm happy to help if someone has an idea.
Environment:
kind v0.12.0-alpha+40cca930158358 go1.17.2 linux/amd64
1.22.2
20.10.9
5.13.19-2-MANJARO
[0] https://github.com/kubernetes-sigs/kind/blob/main/images/base/files/etc/containerd/config.toml#L21
[1] https://github.com/kubernetes-sigs/kind/blob/main/images/base/Dockerfile#L152
[2] https://github.com/containerd/containerd/blob/main/oci/spec.go#L132
[3] https://github.com/containerd/containerd/blob/main/pkg/cri/server/container_create_linux.go#L127
[4] https://github.com/containerd/containerd/blob/main/pkg/cri/opts/spec_linux.go#L89
[5] containerd/cri#515 (comment)
[6] https://github.com/kubernetes-sigs/kind/blob/main/images/base/files/etc/systemd/system/containerd.service#L23
The text was updated successfully, but these errors were encountered: