-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: fix several issues with namespace path handling #4124
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments are minor (can be addressed in other PRs, if there is something to address) and mostly questions. This LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
func usage() { | ||
fmt.Println("usage: remap-rootfs <bundle>") | ||
os.Exit(1) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs more documentation, as markdown or godoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll switch to urfave/cli
and add some docs, though I sometimes wonder if we need a separate place for programs that we only use for test purposes. fs-idmap
is basically useless to anyone outside of our tests, and while remap-rootfs
is theoretically useful I doubt anyone will use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same too :)
We had ignore lines for these warnings, but it turns out this is most likely a bug in shellcheck and we can work around it by moving the helper function definition before any of the functions that use the helper function. Hopefully it'll be fixed soon. Ref: koalaman/shellcheck#2873 Signed-off-by: Aleksa Sarai <[email protected]>
Our handling for name space paths with user namespaces has been broken for a long time. In particular, the need to parse /proc/self/*id_map in quite a few places meant that we would treat userns configurations that had a namespace path as if they were a userns configuration without mappings, resulting in errors. The primary issue was down to the id translation helper functions, which could only handle configurations that had explicit mappings. Obviously, when joining a user namespace we need to map the ids but figuring out the correct mapping is non-trivial in comparison. In order to get the mapping, you need to read /proc/<pid>/*id_map of a process inside the userns -- while most userns paths will be of the form /proc/<pid>/ns/user (and we have a fast-path for this case), this is not guaranteed and thus it is necessary to spawn a process inside the container and read its /proc/<pid>/*id_map files in the general case. As Go does not allow us spawn a subprocess into a target userns, we have to use CGo to fork a sub-process which does the setns(2). To be honest, this is a little dodgy in regards to POSIX signal-safety(7) but since we do no allocations and we are executing in the forked context from a Go program (not a C program), it should be okay. The other alternative would be to do an expensive re-exec (a-la nsexec which would make several other bits of runc more complicated), or to use nsenter(1) which might not exist on the system and is less than ideal. Because we need to logically remap users quite a few times in runc (including in "runc init", where joining the namespace is not feasable), we cache the mapping inside the libcontainer config struct. A future patch will make sure that we stop allow invalid user configurations where a mapping is specified as well as a userns path to join. Finally, add an integration test to make sure we don't regress this again. Signed-off-by: Aleksa Sarai <[email protected]>
While we do cache the mappings when using userns paths, there's no need to do this in this particular case, since we are in the namespace and set[ug]id() give unambiguous EINVAL error codes if the id is unmapped. This appears to also be the only code which does Host[UG]ID calculations from inside "runc init". Ref: 1a5fdc1 ("init: support setting -u with rootless containers") Signed-off-by: Aleksa Sarai <[email protected]>
If a user has misconfigured their userns mappings, they need to know which id specifically is not mapped. There's no need to be vague. Signed-off-by: Aleksa Sarai <[email protected]>
For userns and timens, the mappings (and offsets, respectively) cannot be changed after the namespace is first configured. Thus, configuring a container with a namespace path to join means that you cannot also provide configuration for said namespace. Previously we would silently ignore the configuration (and just join the provided path), but we really should be returning an error (especially when you consider that the configuration userns mappings are used quite a bit in runc with the assumption that they are the correct mapping for the userns -- but in this case they are not). In the case of userns, the mappings are also required if you _do not_ specify a path, while in the case of the time namespace you can have a container with a timens but no mappings specified. It should be noted that the case checking that the user has not specified a userns path and a userns mapping needs to be handled in specconv (as opposed to the configuration validator) because with this patchset we now cache the mappings of path-based userns configurations and thus the validator can't be sure whether the mapping is a cached mapping or a user-specified one. So we do the validation in specconv, and thus the test for this needs to be an integration test. Signed-off-by: Aleksa Sarai <[email protected]>
The owner of /proc/self/timens_offsets doesn't change after creating a userns, meaning that we need to request stage-0 to write our timens mappings for us. Before this patch, attempting to use timens with a proper userns resulted in: FATA[0000] nsexec-1[18564]: failed to update /proc/self/timens_offsets: Permission denied FATA[0000] nsexec-0[18562]: failed to sync with stage-1: next state: Success ERRO[0000] runc run failed: unable to start container process: can't get final child's PID from pipe: EOF Fixes: ebc2e7c ("Support time namespace") Signed-off-by: Aleksa Sarai <[email protected]>
Given we've had several bugs in this behaviour that have now been fixed, add an integration test that makes sure that you can start a container that joins all of the namespaces of a second container. The only namespace we do not join is the mount namespace, because joining a namespace that has been pivot_root'd leads to a bunch of errors. In principle, removing everything from config.json that requires a mount _should_ work, but the root.path configuration is mandatory and we cannot just ignore setting up the rootfs in the namespace joining scenario (if the user has configured a different rootfs, we need to use it or error out, and there's no reasonable way of checking if if the rootfs paths are the same that doesn't result in spaghetti logic). Signed-off-by: Aleksa Sarai <[email protected]>
Previously, all of our userns tests worked around the remapping issue by creating the paths that runc would attempt to create (like /proc). However, this isn't really accurate to how real userns containers are created, so it's much better to actually remap the rootfs. Signed-off-by: Aleksa Sarai <[email protected]>
1.1 backport: #4144 |
@cyphar kinda sad that we've recently removed a bunch of C code, only to add another bunch of C code just because Go can't do setns in the middle of ForkExec. That reminded me of work that @cpuguy83 is doing; PTAL https://go-review.googlesource.com/c/go/+/476095?usp=dashboard |
@kolyshkin I agree, and yeah it would be nice if we could do it with |
This fixes #4122 along with several other issues I found while writing integration tests for joining namespace paths. In particular, this fixes the following issues:
NOTE This requires calling thensenter
binary. I've taken care to ensure that we don't do this in any context other than the host, but I'm not super happy with this. The other option would be to self-exec but that would require changing up howrunc init
works as well so we could have two self-exec entrypoints. I'm not sure it's worth it.setns(2)
there. Technically it's a little dodgy to be doing work after afork(2)
, and while we don't do anything too complicated (like heap allocations), I'm not sure I'm comfortable saying that it won't ever have any issues.mkdir -p rootfs/{proc,sys,dev}
hack.There is a test I want to add to #3985 which requires these changes (to verify that
idmap
uses the correct user namespace when joining a userns instead of creating a new one).Signed-off-by: Aleksa Sarai [email protected]