-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runsc inside of default docker seccomp policy #4371
Comments
Hi, prattmic invited me here to explain a bit further. Thanks for opening the issue. It's still in the experimental stages but I'm working on a media server that will be doing transcodes in a locked down environment, and I've been investigating which approaches are possible to adopt without adding a portability burden. For wide use i would really want the resulting project to run in a docker container, and adding extra privileges to that container for the promise of better sandboxing within the container is really a non-starter. So that has me looking at either forking gvisor and collapsing it to the bare minimum really, or writing a tiny seccomp and ldpreload thing just for this purpose. |
That's an interesting use case. gVisor setups up many layers of defense around the sandbox to reduce access to host in case the sandbox is compromised. You can find more details in the Containing a Real Vulnerability blog post. Configuring some of these layers require So you can remove some of the layers that require high privilege (e.g. namespaces, pivot_root), similar to the way Another consideration is that the gofer requires these capabilities to function correctly. Otherwise, some operations will not be allowed, like creating files with different owners. A few other options to consider are:
|
This is the POC patch which adds the --unprivileged flag. With this flag, the gVisor can be executed in a default docker container with limitations that Fabricio described in the previous comment.
If the unprivileged flag is specified, Sentry and Gofer processes are running in the current set of namespace. By other words, we remove one level of isolation. But in your case, a docker container provides you this extra level, so I think your use-case can be still valid. |
This is interesting! How do filesystem and PID isolation work in this case? It looks from my untrained reading that this would allow access to send signals to other processes owned by the same user, or open files accessible to the current user. Does the emulated kernel provide that level of isolation? |
Yes, the userspace kernel still provides isolation, because it is emulating the OS based on the provided configuration. These capabilities are providing defense-in-depth in the event that the kernel is compromised. In the case of signals, all thread IDs within the sandbox are entirely internal to the userspace kernel, with no relation to the host. Signal syscalls sent by the sandboxed application can only target other threads in the sandbox (implementing a signal syscall in the userspace kernel may not even send a host signal at all). In fact, using a PID namespace is really a third layer of defense, since the userspace kernel can't send signals to other processes anyways. Similarly, the userspace kernel can't directly open host files. That is mediated by another process called the gofer. The gofer won't grant access to files not allowed by the configuration, but were it to be compromised, the mount namespace containing only the configured files provides an additional layer of protection. |
This is interesting, slightly different use-case I was also trying to use runsc inside a container in an attempt to get a dockerised (proprietary) application to work on a hardened K8S platform, the application requires some privileged capabilities to work however the container platform drops all privileged capabilities for tenants for security reasons. The idea was to use gvisor in a pod to (in a crude sense) pick these capabilities back up as a compatibility layer for the app. Would such a thing be feasible? |
It's not a generic solution that will work with all containers, but it may work in your case depending on what the container does at runtime. In gVisor, all file system operations are handled by an external file proxy, called Gofer, that is isolated from the sandbox for security purposes. The gofer requires capabilities to function correctly. For example, when the container creates a file running as an user that exists inside the sandbox, the gofer requires |
Is there any possibility the |
For my project I must also run gVisor inside a docker container for integration-testing purposes. It is not possible to do this outside of a docker container, as there are other requirements the environment has like specific file system mounting and Linux-specific tools, while the test environment must be triggered locally from systems that don't have gVisor installed, file systems mounted, or are even Linux. Is there any way the unprivileged flag could somehow be merged or updated? |
@scanlime on Twitter is trying to run runsc inside a Docker container with the standard seccomp policy enabled. This is similar to rootless mode (#311), but a little bit more strict.
The immediate issue is that we exec into empty namespaces, which the profile does not allow. It is not clear if there would be more issues if that were resolved, though I didn't see any glaring issues comparing our seccomp filters to Docker's.
It's also not clear if the defense-in-depth features we'd have to disable to make this work would make it a bad idea. But in general, it is very reasonable to want to run a sandbox as a subprocess in an existing container.
cc @fvoznika @nlacasse
The text was updated successfully, but these errors were encountered: