Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--userns=keep-id:uid=1000,gid=1000 also changes container init process user #24934

Open
edvgui opened this issue Jan 4, 2025 · 9 comments
Open
Labels
documentation Issue or fix is in project documentation

Comments

@edvgui
Copy link

edvgui commented Jan 4, 2025

Issue Description

I like to run multiple applications in podman containers, and persist the storage using bind mounts. I run all these containers rootless, and always make sure that the user of the process inside the container that is producing the data will match the user running the container on the host (using id mapping).

I recently happily discovered that --userns=keep-id was accepting some uid and gid options, allowing me to stop doing my uid mappings by hand with --uidmap and --gidmap (repeating them three times to make sure to have the full subuid/subgid ranges mapped)

It all looked good, until I realized that this userns mode was doing more than what I understood from the documentation, it also changes the user that will start the initial process in the container.

To me, it looks like that is not a desired behavior. Is it?

Steps to reproduce the issue

To illustrate my point, let's start the fedora container, with all the default options, and check the user of the initial process. Without a surprise, it is root:

guillaume@framework$ podman run --rm fedora:latest whoami
root

Now let's use uid/gid mapping, it is still root:

guillaume@framework:~$ podman run --rm --uidmap=0:1:1000 --uidmap=1000:0:1 --uidmap=1001:1001:64536 --gidmap=0:1:1000 --gidmap=1000:0:1 --gidmap=1001:1001:64536 fedora:latest whoami
root

Now, let's use the userns option with keep-id, to my surprise, my host username suddenly exists inside the container (surprising but ok), and it is the user of the main process of the container, while I still expected root:

guillaume@framework:~$ podman run --rm --userns=keep-id:uid=1000,gid=1000 fedora:latest whoami
guillaume

If I use the --user option, things go back to "normal", but the annoyance is that I am also overwriting the default specified by the container image I am using:

guillaume@framework:~$ podman run --rm --userns=keep-id:uid=1000,gid=1000 --user=root:root fedora:latest whoami
root

Describe the results you received

Using --userns=keep-id,uid=...,gid=... changes the default initial process user inside the container

Describe the results you expected

I would expect that --userns=keep-id,uid=...,gid=... doesn't impact the default initial process user inside the container

podman info output

guillaume@framework:~$ podman info
host:
  arch: amd64
  buildahVersion: 1.38.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.12-3.fc41.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: '
  cpuUtilization:
    idlePercent: 97.57
    systemPercent: 1.37
    userPercent: 1.06
  cpus: 12
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: workstation
    version: "41"
  eventLogger: journald
  freeLocks: 2048
  hostname: framework
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
  kernel: 6.12.5-200.fc41.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 3822657536
  memTotal: 16023031808
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.13.1-1.fc41.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.13.1
    package: netavark-1.13.1-1.fc41.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.13.1
  ociRuntime:
    name: crun
    package: crun-1.19.1-1.fc41.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.19.1
      commit: 3e32a70c93f5aa5fea69b50256cca7fd4aa23c80
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20241211.g09478d5-1.fc41.x86_64
    version: |
      pasta 0^20241211.g09478d5-1.fc41.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.3.1-1.fc41.x86_64
    version: |-
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.8.0
      SLIRP_CONFIG_VERSION_MAX: 5
      libseccomp: 2.5.5
  swapFree: 8589930496
  swapTotal: 8589930496
  uptime: 1h 4m 26.00s (Approximately 0.04 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
store:
  configFile: /home/guillaume/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/guillaume/.local/share/containers/storage
  graphRootAllocated: 498387124224
  graphRootUsed: 84419944448
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 2
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/guillaume/.local/share/containers/storage/volumes
version:
  APIVersion: 5.3.1
  Built: 1732147200
  BuiltTime: Thu Nov 21 01:00:00 2024
  GitCommit: ""
  GoVersion: go1.23.3
  Os: linux
  OsArch: linux/amd64
  Version: 5.3.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

@edvgui edvgui added the kind/bug Categorizes issue or PR as related to a bug. label Jan 4, 2025
@edvgui
Copy link
Author

edvgui commented Jan 4, 2025

Another way of illustrating this, by inspecting the running containers instead of the output of whoami:

guillaume@framework:~$ podman run --rm -d --name test fedora:latest sleep infinity
53123d4230e51a4d8ccee1f1267fd38d68baf7e8273fd934cc01e91b67984d32
guillaume@framework:~$ podman inspect -f "{{.Config.User}}" test

guillaume@framework:~$ podman inspect -f "{{.HostConfig.IDMappings}}" test
<nil>
guillaume@framework:~$ podman kill test
test
guillaume@framework:~$ podman run --rm -d --name test --userns=keep-id:uid=1000,gid=1000 fedora:latest sleep infinity
0eec56df2cffe69bf4590465b4b799774183be04eb4ca337abcef3b17611f4a9
guillaume@framework:~$ podman inspect -f "{{.Config.User}}" test
1000:1000
guillaume@framework:~$ podman inspect -f "{{.HostConfig.IDMappings}}" test
{[0:1:1000 1000:0:1 1001:1001:64536] [0:1:1000 1000:0:1 1001:1001:64536]}
guillaume@framework:~$ podman kill test
test
guillaume@framework:~$ podman run --rm -d --name test --uidmap=0:1:1000 --uidmap=1000:0:1 --uidmap=1001:1001:64536 --gidmap=0:1:1000 --gidmap=1000:0:1 --gidmap=1001:1001:64536 fedora:latest sleep infinity
a68187c2275a32b0b85785a1d09a8c413193e7c642cf331e6c6ff3998cbf5328
guillaume@framework:~$ podman inspect -f "{{.Config.User}}" test

guillaume@framework:~$ podman inspect -f "{{.HostConfig.IDMappings}}" test
{[0:1:1000 1000:0:1 1001:1001:64536] [0:1:1000 1000:0:1 1001:1001:64536]}
guillaume@framework:~$ podman kill test
test

@eriksjolund
Copy link
Contributor

Sidenote:

Here is a workaround. Use the + syntax provided by --uidmap and --gidmap

The resulting /proc/self/uid_map and /proc/self/gid_map are the same

$ host_uid=$(id -u)
$ host_gid=$(id -g)
$ container_uid=1000
$ container_uid=1000
$ podman run --rm --userns=keep-id:uid=${container_uid},gid=${container_gid} fedora:latest cat /proc/self/uid_map
         0          1       1000
      1000          0          1
      1001       1001      64536
$ podman run --rm --userns=keep-id:uid=${container_uid},gid=${container_gid} fedora:latest cat /proc/self/gid_map
         0          1       1000
      1000          0          1
      1001       1001      64536
$ podman run --rm  --uidmap +${container_uid}:@${host_uid}:1 --gidmap +${container_gid}:@${host_gid}:1 fedora:latest cat /proc/self/uid_map
         0          1       1000
      1000          0          1
      1001       1001      64536
$ podman run --rm  --uidmap +${container_uid}:@${host_uid}:1 --gidmap +${container_gid}:@${host_gid}:1 fedora:latest cat /proc/self/gid_map
         0          1       1000
      1000          0          1
      1001       1001      64536
$ 

The container process is running as root

$ podman run --rm  --uidmap +${container_uid}:@${host_uid}:1 --gidmap +${container_gid}:@${host_gid}:1 fedora:latest id
uid=0(root) gid=0(root) groups=0(root)

@edvgui
Copy link
Author

edvgui commented Jan 6, 2025

Thanks a lot for the workaround @eriksjolund ! That works just like I wished 👌

@eriksjolund
Copy link
Contributor

eriksjolund commented Jan 6, 2025

That works just like I wished 👌

I also prefer the + syntax. It's more predictable which container init process user is used.

Another thing: I noticed a small difference between keep-id and + syntax in the following example

$ host_uid=$(id -u)
$ host_gid=$(id -g)
$ container_uid=1000
$ container_gid=1000
$ podman run --rm  \
    --uidmap +${container_uid}:@${host_uid}:1 \
    --gidmap +${container_gid}:@${host_gid}:1 \
    --user 0:0 \
    fedora:latest id
uid=0(root) gid=0(root) groups=0(root)
$ podman run --rm \
     --userns=keep-id:uid=${container_uid},gid=${container_gid} \
     --user 0:0 \
     fedora:latest id
uid=0(root) gid=0(root) groups=0(root),1000

Result:

+ syntax gives

groups=0(root)

keep-id syntax gives

groups=0(root),1000

@edvgui
Copy link
Author

edvgui commented Jan 6, 2025

It's more predictable which container init process user is used.

It is at the moment, but I think it shouldn't be, right? Or at least I hope that the --userns option should be as predictable as the manual mapping, because the --userns option also has some advantages over the manual mapping, such as being able to use it within a pod:

guillaume@fedora:~$ podman pod create --pod-id-file /tmp/test.pod-id test
e0f1d096c9fcbcb4887a603750618e5352033f5f874d385b695ce809cbf41a4d
guillaume@fedora:~$ podman run --rm --userns=keep-id:uid=1000,gid=1000 --pod-id-file /tmp/test.pod-id fedora:latest id
uid=1000(guillaume) gid=1000(1000) groups=1000(1000)
guillaume@fedora:~$ podman run --rm --uidmap +1000:0:1 --gidmap +1000:0:1 --pod-id-file /tmp/test.pod-id fedora:latest id
Error: cannot specify a new uid/gid map when entering a pod with an infra container: invalid argument

Yet another thing, it can be used with a pod, but only using the --pod-id-file option, not the --pod option, no idea why

guillaume@fedora:~$ podman run --rm --userns=keep-id:uid=1000,gid=1000 --pod test fedora:latest id
Error: --userns and --pod cannot be set together

@Luap99
Copy link
Member

Luap99 commented Jan 6, 2025

It's more predictable which container init process user is used.

It is at the moment, but I think it shouldn't be, right? Or at least I hope that the --userns option should be as predictable as the manual mapping, because the --userns option also has some advantages over the manual mapping, such as being able to use it within a pod:

guillaume@fedora:$ podman pod create --pod-id-file /tmp/test.pod-id test
e0f1d096c9fcbcb4887a603750618e5352033f5f874d385b695ce809cbf41a4d
guillaume@fedora:
$ podman run --rm --userns=keep-id:uid=1000,gid=1000 --pod-id-file /tmp/test.pod-id fedora:latest id
uid=1000(guillaume) gid=1000(1000) groups=1000(1000)
guillaume@fedora:~$ podman run --rm --uidmap +1000:0:1 --gidmap +1000:0:1 --pod-id-file /tmp/test.pod-id fedora:latest id
Error: cannot specify a new uid/gid map when entering a pod with an infra container: invalid argument

Yet another thing, it can be used with a pod, but only using the --pod-id-file option, not the --pod option, no idea why

guillaume@fedora:~$ podman run --rm --userns=keep-id:uid=1000,gid=1000 --pod test fedora:latest id
Error: --userns and --pod cannot be set together

These are just bugs as the restriction was implemented poorly, you should not be able to nest user namespace options when in a pod as this well mess up a ton of things. Mostly permissions on the various shared file systems.
From a kernel POV most of this is valid but it really does not work very well in practise which is why such limitations hwere added.


The behavior of userns=keep-id to change the user is intentional, it will even add a /etc/passwd entry for your local user name as the goal of this option was to sort of "leak" the current user with the same uid in the container so you could treat it like being on the real host.
Given that there are most likely ton of user depending on that behavior I don't see us changing that.

We should definitely update the docs though to mention the user overwrite behavior.

@Luap99 Luap99 added documentation Issue or fix is in project documentation and removed kind/bug Categorizes issue or PR as related to a bug. labels Jan 6, 2025
@edvgui
Copy link
Author

edvgui commented Jan 6, 2025

Okay, thanks for the clarification, so the desired option for me stays the manual mapping (made more handy with the + feature).


Mostly permissions on the various shared file systems.

Regarding the weak restrictions on the nested userns, could you be more specific as to which shared file systems could have issues? If I managed to make my containers start using such option that shouldn't be available, and they all seem to be working fine, what type of issues should I expect? The only thing I can think about is the shared network stack, and the /proc/net folder that should be owned by root in all containers, but as long as all containers have the "same root user" (so all of them do use the keep-id for any other uid than 0) I should be free of troubles? Is there something else? And what about the future, will this restriction become more strict? (Then there is no point even leaving it as it is)

@Luap99
Copy link
Member

Luap99 commented Jan 6, 2025

The only thing I can think about is the shared network stack, and the /proc/net folder that should be owned by root in all containers, but as long as all containers have the "same root user" (so all of them do use the keep-id for any other uid than 0) I should be free of troubles? Is there something else?

Yes that is one, I think if the ipc namesapce is shared we share /dev/shm and the permissions there would also be different.
Another possible problem are capabilities: https://blog.podman.io/2023/12/interaction-between-user-namespaces-and-capabilities/

I don't have time to dig through the history right now to find the exact reasons the check was added so there might be more.

And what about the future, will this restriction become more strict? (Then there is no point even leaving it as it is)

Well I would argue the current way we restrict this is wrong, either we fully restrict it (breaking change so not without a major 6.0 release) or we loose that restriction and then leave the possible problems to user to figure/out deal with.

I can be convinced for either one but the current limitation of blocking only a few flag combination is just silly and I rather get this fixed properly at some point.

@edvgui
Copy link
Author

edvgui commented Jan 6, 2025

Yes that is one, I think if the ipc namesapce is shared we share /dev/shm and the permissions there would also be different.
Another possible problem are capabilities: https://blog.podman.io/2023/12/interaction-between-user-namespaces-and-capabilities/

Okay, these things I clearly didn't suspect, it still feels a bit over my head atm, but this is good to know. And this is enough to convince me that what I was trying to do in my pod was not right. That being said, I think I would be in favor of allowing it because it sounds like it can be used correctly and safely, if one knows what they are doing (I didn't, but there must be some people out there who do) and blocking it correctly just sounds too difficult if even the kernel is perfectly fine with it.

Anyway, thanks again for the bits of explanation, that was an interesting lesson about an issue I never considered

EDIT: Found the original PR introducing the check: #10589 (comment) but I don't really understand the rationale that is given there, it only seems to be about complexity for the end user, but complexity that it is bringing upon itself so what is the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Issue or fix is in project documentation
Projects
None yet
Development

No branches or pull requests

3 participants