Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fake_xattr wrong exit code #1

Closed
5kt opened this issue Oct 15, 2024 · 1 comment
Closed

fake_xattr wrong exit code #1

5kt opened this issue Oct 15, 2024 · 1 comment
Assignees
Milestone

Comments

@5kt
Copy link

5kt commented Oct 15, 2024

What happened:
Building Gardenlinux fails with no real error being logged.
After some debugging it seems that fake_xattr is no longer working as expected.
I am running the ghcr.io/gardenlinux/builder:357fbe01a5a95854ccb5065c28fc6eca13466a46 image.
Podman version: 5.2.3
Kernel version: Linux debian 6.10.11-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.11-1 (2024-09-22) x86_64 GNU/Linux

What you expected to happen:
fake_xattr should ideally return the same exit code as the provided command.

How to reproduce it (as minimally and precisely as possible):

$ podman -v
podman version 5.2.3
$ uname -a
Linux debian 6.10.11-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.11-1 (2024-09-22) x86_64 GNU/Linux
$ podman run -ti ghcr.io/gardenlinux/builder:357fbe01a5a95854ccb5065c28fc6eca13466a46 fake_xattr ls
bin  boot  builder  dev  etc  home  lib  lib64	media  mnt  opt  proc  root  run  sbin	srv  sys  tmp  usr  var
$ echo $?
255
# FAKE_XATTR_DEBUG=1 fake_xattr ls
debug: main.c:261 (main) [51]: fake_xattr debug mode
debug: zalloc.c:12 (zalloc) [52]: allocating 16 bytes
debug: seccomp_unotify.c:380 (target_vfork) [53]: vforked
debug: seccomp_unotify.c:279 (supervisor) [52]: listening for seccomp notify events
bin  boot  builder  dev  etc  home  lib  lib64	media  mnt  opt  proc  root  run  sbin	srv  sys  tmp  usr  var
debug: seccomp_unotify.c:306 (supervisor) [52]: seccomp_notify_fd polled (0x0010)
debug: seccomp_unotify.c:310 (supervisor) [52]: seccomp_notify_fd POLLHUP event recieved
debug: main.c:285 (main) [52]: max heap memory usage: 24 bytes
debug: main.c:237 (on_sigchld) [51]: SIGCHLD recieved

debug: main.c:241 (on_sigchld) [51]: reaped zombie process 53

#
@nkraetzschmar
Copy link
Contributor

Error reproduced on new kernel in the unit tests:

test_seccomp_unotify.c
debug: seccomp_unotify.c:380 (target_vfork) [6043]: vforked
debug: seccomp_unotify.c:279 (supervisor) [6042]: listening for seccomp notify events
debug: seccomp_unotify.c:433 (on_sigchld) [6042]: SIGCHLD recieved

debug: seccomp_unotify.c:306 (supervisor) [6042]: seccomp_notify_fd polled (0x0010)
debug: seccomp_unotify.c:310 (supervisor) [6042]: seccomp_notify_fd POLLHUP event recieved
[1/9] no_opt: failed

Compared to the intended behaviour on older kernels:

test_seccomp_unotify.c
debug: seccomp_unotify.c:380 (target_vfork) [3810]: vforked
debug: seccomp_unotify.c:279 (supervisor) [3809]: listening for seccomp notify events
debug: seccomp_unotify.c:433 (on_sigchld) [3809]: SIGCHLD recieved

debug: seccomp_unotify.c:293 (supervisor) [3809]: target 3810 exited with status 0
debug: seccomp_unotify.c:306 (supervisor) [3809]: seccomp_notify_fd polled (0x0010)
debug: seccomp_unotify.c:310 (supervisor) [3809]: seccomp_notify_fd POLLHUP event recieved
[1/9] no_opt: passed

So it looks like poll receives the POLLHUP event before the the child process has been waited on. Therefore the main supervisor loop breaks before the child exit code could be retrieved. This results in the initial value of the return code variable to be treated as the childs exit code (init to -1).

Root cause analysis:

This behaviour was introduced by the following change to the kernel: torvalds/linux@bfafe5e

The kernel keeps a reference counter for the number of users of a given seccomp filter. Once this reference counter reaches zero the POLLHUP event is triggered.

Prior to torvalds/linux@bfafe5e this reference counter was only decremented once a task had exited and had been waited for.

This change moved the seccomp_filter_release call from the release_task function to the do_exit function, thus the reference counter is already decremented when a task has exited but not yet been waited for.

Therefore the POLLHUP event is also received before the child process has been waited for. This corresponds with the output we are seeing and explains the occuring issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants