interaction between seccomp-notify code and signal restarting looks broken #526

thejh · 2020-10-26T10:37:03Z

tl;dr:

Please handle spurious ENOENT errors from SECCOMP_IOCTL_NOTIF_RECV correctly in libcrun_seccomp_notify_plugins so that containers don't crash because of spurious errors that can happen legitimately.
Please set O_NONBLOCK on the seccomp_notify_fd at the same time to make life easier for us kernel folks.

As far as I can tell, crun treats any error return values from SECCOMP_IOCTL_NOTIF_RECV as fatal in libcrun_seccomp_notify_plugins: libcrun_seccomp_notify_plugins returns that error value up, then wait_for_process returns it up further, libcrun_container_run_internal again returns it up, and then e.g. libcrun_container_run will call force_delete_container_status to nuke the container.

However, SECCOMP_IOCTL_NOTIF_RECV can return an error code in normal usage in the following scenario (where "target" refers to a process running inside the container):

crun: starts epoll_wait() to wait for events on the seccomp_notify_fd (and other fds)
target: performs a syscall that's filtered with SECCOMP_RET_USER_NOTIF; this inserts metadata about this syscall invocation to a kernel-internal list of pending syscalls
crun: epoll_wait() returns and signals readiness of seccomp_notify_fd (because the kernel-internal list of pending syscalls is not empty)
target: receives signal SIGUSR1
target:* syscall aborts, enters signal handler (which removes the pending syscall from the kernel-internal list)
crun: calls SECCOMP_IOCTL_NOTIF_RECV (expecting to receive information about a pending syscall; but that pending syscall has disappeared in the meantime)
crun: SECCOMP_IOCTL_NOTIF_RECV returns -1 with errno set to ENOENT

The same kind of thing can also happen if someone kills the target process in the meantime.
(SECCOMP_IOCTL_NOTIF_SEND can also fail in the same way, but there you seem to be handling it correctly already.)

In addition to fixing this, it would be helpful for the kernel if you could set the O_NONBLOCK flag on seccomp_notify_fd at the same time (to explicitly signal that you don't want seccomp_notify_fd to block when no events are pending). That isn't currently necessary, but it would allow us to clean up the kernel code around this feature a bit; and if you fix these two things at the same time, we can argue that a UAPI-breaking change that makes SECCOMP_IOCTL_NOTIF_RECV always block when no syscalls are currently available would not have negative consequences for any current users (because current crun is completely broken in this case anyway).

For more details, see the current draft of a manual page for the seccomp_user_notif mechanism and the discussion thread on the first draft.

The text was updated successfully, but these errors were encountered:

giuseppe · 2020-10-26T14:25:01Z

thanks for the report.

1. Please handle spurious `ENOENT` errors from `SECCOMP_IOCTL_NOTIF_RECV` correctly in `libcrun_seccomp_notify_plugins` so that containers don't crash because of spurious errors that can happen legitimately.

Is ENOENT the only error to ignore or are there other possible ones?

giuseppe · 2020-10-26T14:28:43Z

opened a PR: #527

thejh · 2020-10-26T14:48:35Z

Yes; if you use it correctly, ENOENT should be the only error.

thejh · 2020-10-26T15:52:21Z

Actually, there's another thing that can happen: If all the target processes have died, the seccomp fd will constantly signal an error condition (EPOLLERR). In that case, you'll probably want to do something like stopping listening on the seccomp fd, or something like that.

For nonblocking users, options for the kernel patch are that we can either keep returning ENOENT (so that you have to disambiguate through the epoll event type) or we can instead return ENOTCONN (which would mean that you could cleanly handle that in libcrun_seccomp_notify_plugins). I'm not entirely sure which one looks better yet, but I'm tending towards returning ENOTCONN.

Maybe I filed this bug a bit prematurely and it'd be better to hold off on actually patching it in LXC until we have a decision on how exactly we want to amend the UAPI...

I'll CC you on the patch submission in case you want to give input there.

brauner · 2020-11-02T15:46:09Z

cc @giuseppe @brauner @kees @mkerrisk

tl;dr:

Please handle spurious ENOENT errors from SECCOMP_IOCTL_NOTIF_RECV correctly in libcrun_seccomp_notify_plugins so that containers don't crash because of spurious errors that can happen legitimately.

Please set O_NONBLOCK on the seccomp_notify_fd at the same time to make life easier for us kernel folks.

As far as I can tell, crun treats any error return values from SECCOMP_IOCTL_NOTIF_RECV as fatal in libcrun_seccomp_notify_plugins: libcrun_seccomp_notify_plugins returns that error value up, then wait_for_process returns it up further, libcrun_container_run_internal again returns it up, and then e.g. libcrun_container_run will call force_delete_container_status to nuke the container.

However, SECCOMP_IOCTL_NOTIF_RECV can return an error code in normal usage in the following scenario (where "target" refers to a process running inside the container):

crun: starts epoll_wait() to wait for events on the seccomp_notify_fd (and other fds)

target: performs a syscall that's filtered with SECCOMP_RET_USER_NOTIF; this inserts metadata about this syscall invocation to a kernel-internal list of pending syscalls

crun: epoll_wait() returns and signals readiness of seccomp_notify_fd (because the kernel-internal list of pending syscalls is not empty)

target: receives signal SIGUSR1

target:* syscall aborts, enters signal handler (which removes the pending syscall from the kernel-internal list)

crun: calls SECCOMP_IOCTL_NOTIF_RECV (expecting to receive information about a pending syscall; but that pending syscall has disappeared in the meantime)

crun: SECCOMP_IOCTL_NOTIF_RECV returns -1 with errno set to ENOENT

The same kind of thing can also happen if someone kills the target process in the meantime.
(SECCOMP_IOCTL_NOTIF_SEND can also fail in the same way, but there you seem to be handling it correctly already.)

In addition to fixing this, it would be helpful for the kernel if you could set the O_NONBLOCK flag on seccomp_notify_fd at the same time (to explicitly signal that you don't want seccomp_notify_fd to block when no events are pending). That isn't currently necessary, but it would allow us to clean up the kernel code around this feature a bit; and if you fix these two things at the same time, we can argue that a UAPI-breaking change that makes SECCOMP_IOCTL_NOTIF_RECV always block when no syscalls are currently available would not have negative consequences for any current users (because current crun is completely broken in this case anyway).

For more details, see the current draft of a manual page for the seccomp_user_notif mechanism and the discussion thread on the first draft.

We're already handling the ENOENT case correctly, i.e. we don't abort but continue. But I'll add a log entry for it.

thejh · 2020-11-02T15:47:59Z

We're already handling the ENOENT case correctly, i.e. we don't abort but continue. But I'll add a log entry for it.

Where is that code?

brauner · 2020-11-02T15:50:54Z

@thejh I've pinged you on the PR.

rhatdan closed this as completed in b9172d0 Nov 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

interaction between seccomp-notify code and signal restarting looks broken #526

interaction between seccomp-notify code and signal restarting looks broken #526

thejh commented Oct 26, 2020

giuseppe commented Oct 26, 2020

giuseppe commented Oct 26, 2020

thejh commented Oct 26, 2020

thejh commented Oct 26, 2020

brauner commented Nov 2, 2020

thejh commented Nov 2, 2020

brauner commented Nov 2, 2020

interaction between seccomp-notify code and signal restarting looks broken #526

interaction between seccomp-notify code and signal restarting looks broken #526

Comments

thejh commented Oct 26, 2020

giuseppe commented Oct 26, 2020

giuseppe commented Oct 26, 2020

thejh commented Oct 26, 2020

thejh commented Oct 26, 2020

brauner commented Nov 2, 2020

thejh commented Nov 2, 2020

brauner commented Nov 2, 2020