Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interaction between seccomp-notify code and signal restarting looks broken #526

Closed
thejh opened this issue Oct 26, 2020 · 7 comments
Closed

Comments

@thejh
Copy link

thejh commented Oct 26, 2020

cc @giuseppe @brauner @kees @mkerrisk

tl;dr:

  1. Please handle spurious ENOENT errors from SECCOMP_IOCTL_NOTIF_RECV correctly in libcrun_seccomp_notify_plugins so that containers don't crash because of spurious errors that can happen legitimately.
  2. Please set O_NONBLOCK on the seccomp_notify_fd at the same time to make life easier for us kernel folks.

As far as I can tell, crun treats any error return values from SECCOMP_IOCTL_NOTIF_RECV as fatal in libcrun_seccomp_notify_plugins: libcrun_seccomp_notify_plugins returns that error value up, then wait_for_process returns it up further, libcrun_container_run_internal again returns it up, and then e.g. libcrun_container_run will call force_delete_container_status to nuke the container.

However, SECCOMP_IOCTL_NOTIF_RECV can return an error code in normal usage in the following scenario (where "target" refers to a process running inside the container):

  1. crun: starts epoll_wait() to wait for events on the seccomp_notify_fd (and other fds)
  2. target: performs a syscall that's filtered with SECCOMP_RET_USER_NOTIF; this inserts metadata about this syscall invocation to a kernel-internal list of pending syscalls
  3. crun: epoll_wait() returns and signals readiness of seccomp_notify_fd (because the kernel-internal list of pending syscalls is not empty)
  4. target: receives signal SIGUSR1
  5. target:* syscall aborts, enters signal handler (which removes the pending syscall from the kernel-internal list)
  6. crun: calls SECCOMP_IOCTL_NOTIF_RECV (expecting to receive information about a pending syscall; but that pending syscall has disappeared in the meantime)
  7. crun: SECCOMP_IOCTL_NOTIF_RECV returns -1 with errno set to ENOENT

The same kind of thing can also happen if someone kills the target process in the meantime.
(SECCOMP_IOCTL_NOTIF_SEND can also fail in the same way, but there you seem to be handling it correctly already.)

In addition to fixing this, it would be helpful for the kernel if you could set the O_NONBLOCK flag on seccomp_notify_fd at the same time (to explicitly signal that you don't want seccomp_notify_fd to block when no events are pending). That isn't currently necessary, but it would allow us to clean up the kernel code around this feature a bit; and if you fix these two things at the same time, we can argue that a UAPI-breaking change that makes SECCOMP_IOCTL_NOTIF_RECV always block when no syscalls are currently available would not have negative consequences for any current users (because current crun is completely broken in this case anyway).

For more details, see the current draft of a manual page for the seccomp_user_notif mechanism and the discussion thread on the first draft.

@giuseppe
Copy link
Member

thanks for the report.

1. Please handle spurious `ENOENT` errors from `SECCOMP_IOCTL_NOTIF_RECV` correctly in `libcrun_seccomp_notify_plugins` so that containers don't crash because of spurious errors that can happen legitimately.

Is ENOENT the only error to ignore or are there other possible ones?

@giuseppe
Copy link
Member

opened a PR: #527

@thejh
Copy link
Author

thejh commented Oct 26, 2020

Yes; if you use it correctly, ENOENT should be the only error.

@thejh
Copy link
Author

thejh commented Oct 26, 2020

Actually, there's another thing that can happen: If all the target processes have died, the seccomp fd will constantly signal an error condition (EPOLLERR). In that case, you'll probably want to do something like stopping listening on the seccomp fd, or something like that.

For nonblocking users, options for the kernel patch are that we can either keep returning ENOENT (so that you have to disambiguate through the epoll event type) or we can instead return ENOTCONN (which would mean that you could cleanly handle that in libcrun_seccomp_notify_plugins). I'm not entirely sure which one looks better yet, but I'm tending towards returning ENOTCONN.

Maybe I filed this bug a bit prematurely and it'd be better to hold off on actually patching it in LXC until we have a decision on how exactly we want to amend the UAPI...

I'll CC you on the patch submission in case you want to give input there.

@rhatdan rhatdan closed this as completed in b9172d0 Nov 2, 2020
@brauner
Copy link

brauner commented Nov 2, 2020

cc @giuseppe @brauner @kees @mkerrisk

tl;dr:

  1. Please handle spurious ENOENT errors from SECCOMP_IOCTL_NOTIF_RECV correctly in libcrun_seccomp_notify_plugins so that containers don't crash because of spurious errors that can happen legitimately.
  2. Please set O_NONBLOCK on the seccomp_notify_fd at the same time to make life easier for us kernel folks.

As far as I can tell, crun treats any error return values from SECCOMP_IOCTL_NOTIF_RECV as fatal in libcrun_seccomp_notify_plugins: libcrun_seccomp_notify_plugins returns that error value up, then wait_for_process returns it up further, libcrun_container_run_internal again returns it up, and then e.g. libcrun_container_run will call force_delete_container_status to nuke the container.

However, SECCOMP_IOCTL_NOTIF_RECV can return an error code in normal usage in the following scenario (where "target" refers to a process running inside the container):

  1. crun: starts epoll_wait() to wait for events on the seccomp_notify_fd (and other fds)
  2. target: performs a syscall that's filtered with SECCOMP_RET_USER_NOTIF; this inserts metadata about this syscall invocation to a kernel-internal list of pending syscalls
  3. crun: epoll_wait() returns and signals readiness of seccomp_notify_fd (because the kernel-internal list of pending syscalls is not empty)
  4. target: receives signal SIGUSR1
  5. target:* syscall aborts, enters signal handler (which removes the pending syscall from the kernel-internal list)
  6. crun: calls SECCOMP_IOCTL_NOTIF_RECV (expecting to receive information about a pending syscall; but that pending syscall has disappeared in the meantime)
  7. crun: SECCOMP_IOCTL_NOTIF_RECV returns -1 with errno set to ENOENT

The same kind of thing can also happen if someone kills the target process in the meantime.
(SECCOMP_IOCTL_NOTIF_SEND can also fail in the same way, but there you seem to be handling it correctly already.)

In addition to fixing this, it would be helpful for the kernel if you could set the O_NONBLOCK flag on seccomp_notify_fd at the same time (to explicitly signal that you don't want seccomp_notify_fd to block when no events are pending). That isn't currently necessary, but it would allow us to clean up the kernel code around this feature a bit; and if you fix these two things at the same time, we can argue that a UAPI-breaking change that makes SECCOMP_IOCTL_NOTIF_RECV always block when no syscalls are currently available would not have negative consequences for any current users (because current crun is completely broken in this case anyway).

For more details, see the current draft of a manual page for the seccomp_user_notif mechanism and the discussion thread on the first draft.

We're already handling the ENOENT case correctly, i.e. we don't abort but continue. But I'll add a log entry for it.

@thejh
Copy link
Author

thejh commented Nov 2, 2020

We're already handling the ENOENT case correctly, i.e. we don't abort but continue. But I'll add a log entry for it.

Where is that code?

@brauner
Copy link

brauner commented Nov 2, 2020

@thejh I've pinged you on the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants