Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Containerd.io 1.7.24-1 | unix opening TUN device file: operation not permitted #2606

Open
Phil57 opened this issue Nov 30, 2024 · 29 comments
Labels
Category: Documentation ✒️ A problem with the readme or a code comment. Status: 🟡 Nearly resolved This might be resolved or is about to be resolved

Comments

@Phil57
Copy link

Phil57 commented Nov 30, 2024

Is this urgent?

Yes

Host OS

Ubuntu 24.04

CPU arch

x86_64

VPN service provider

AirVPN

What are you using to run the container

docker-compose

What is the version of Gluetun

3.39.1

What's the problem 🤔

Hello,

Following an update of containerd.io from version 1.7.23-1 to 1.7.24-1, Gluetun doesn't start anymore.
Downgrading to 1.7.23-1 fixes the issue.

Someone opened a bug report for containerd, but I don't know if meanwhile you can circumvent the issue or post a warning somewhere.
containerd/containerd#11078

Share your logs (at least 10 lines)

gluetun_proxy  | 2024-11-30T10:47:17+01:00 INFO TUN device is not available: open /dev/net/tun: no such file or directory; creating it...
gluetun_proxy  | 2024-11-30T10:47:17+01:00 INFO [routing] routing cleanup...
gluetun_proxy  | 2024-11-30T10:47:17+01:00 INFO [routing] default route found: interface eth0, gateway 172.20.0.1, assigned IP 172.20.0.2 and family v4
gluetun_proxy  | 2024-11-30T10:47:17+01:00 INFO [routing] deleting route for 0.0.0.0/0
gluetun_proxy  | 2024-11-30T10:47:17+01:00 ERROR creating tun device: unix opening TUN device file: operation not permitted

Share your configuration

No response

Copy link
Contributor

@qdm12 is more or less the only maintainer of this project and works on it in his free time.
Please:

@dojoca
Copy link

dojoca commented Nov 30, 2024

Same problem here:
2024-11-30T20:07:06+08:00 ERROR creating tun device: unix opening TUN device file: operation not permitted

Seems to be related to a recent update - I installed containerd.io 1.7.24-1.

@dojoca
Copy link

dojoca commented Nov 30, 2024

Interim fix - I downgraded containerd.io to 1.7.23-1, it's resolved the issue.

sudo apt-get install containerd.io=1.7.23-1

@saltydk
Copy link

saltydk commented Nov 30, 2024

Or just mount /dev/net/tun into the container, that works fine with 1.7.24

@jumoog
Copy link

jumoog commented Dec 1, 2024

related to containerd/containerd#11078

@dimikot
Copy link

dimikot commented Dec 2, 2024

I have another error:

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "proc" to rootfs at "/proc": mount src=proc, dst=/proc, dstFd=/proc/thread-self/fd/8, flags=0xe: no such file or directory: unknown
Error: failed to start containers: buildx_buildkit_container0

Repro:

docker buildx rm container
docker buildx create --name container --driver=docker-container --bootstrap

when running it in docker-in-docker mode (using sysbox-ce_0.6.4-0.linux).

Found this issue, downgraded to 1.7.23-1 - it fixed the issue.

@saltydk
Copy link

saltydk commented Dec 2, 2024

I have another error:

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "proc" to rootfs at "/proc": mount src=proc, dst=/proc, dstFd=/proc/thread-self/fd/8, flags=0xe: no such file or directory: unknown
Error: failed to start containers: buildx_buildkit_container0

Repro:

docker buildx rm container
docker buildx create --name container --driver=docker-container --bootstrap

when running it in docker-in-docker mode (using sysbox-ce_0.6.4-0.linux).

Found this issue, downgraded to 1.7.23-1 - it fixed the issue.

Was this comment meant for this repository? Doesn't seem like it at first glance at least.

@TylerReid
Copy link

Or just mount /dev/net/tun into the container, that works fine with 1.7.24

If you are using docker compose, adding this to your gluetun service should fix the issue:

    devices:
      - /dev/net/tun:/dev/net/tun

@mikemhenry
Copy link

Can confirm that this fix #2606 (comment) works

@mutlow89
Copy link

mutlow89 commented Dec 3, 2024

thanks for the advice, worked for me too 2606

@Raptor039
Copy link

Raptor039 commented Dec 3, 2024

Hi,

It's not a bug from gluetun, look at : containerd/containerd#11078 (comment)

@thueske
Copy link

thueske commented Dec 5, 2024

Only getting:
Error response from daemon: error gathering device information while adding custom device "/dev/net/tun": no such file or directory when adding this. I am using the latest https://alpinelinux.org/posts/Alpine-3.21.0-released.html

@darthShadow
Copy link

You are probably missing the tun module.

From https://wiki.alpinelinux.org/wiki/Setting_up_a_OpenVPN_server:

modprobe tun
echo "tun" >> /etc/modules-load.d/tun.conf

@thueske
Copy link

thueske commented Dec 5, 2024

You are probably missing the tun module.

From https://wiki.alpinelinux.org/wiki/Setting_up_a_OpenVPN_server:

modprobe tun
echo "tun" >> /etc/modules-load.d/tun.conf

THANKS! Works.

@BloodBlight
Copy link

BloodBlight commented Dec 9, 2024

For me, adding this does NOT solve the issue:

    devices:
      - /dev/net/tun:/dev/net/tun

Nor did this:

modprobe tun

The only fix I found (other than rolling back) was this:

    privileged: true

@htristan
Copy link

Ok so I had this issue too - thanks for the comments here. This apparently is an intended change on containerd and runx - see this comment for the details - so they closed their ticket as won't fix.
containerd/containerd#11078 (comment)

I confirmed that downgrading containerd (as suggested) fixed the issue, but then figured out (following the comment) how to fix it on the newer version. If you are using docker compose, like I am - you need to add a line similar to this to your yml file for gluetun:

    devices:
     - /dev/net/tun:/dev/net/tun

Adding that line does the trick for me and this now runs smoothly on the 1.7.24 version latest. Hope that helps others!

@admireme
Copy link

admireme commented Dec 10, 2024

Or just mount /dev/net/tun into the container, that works fine with 1.7.24

If you are using docker compose, adding this to your gluetun service should fix the issue:

    devices:
      - /dev/net/tun:/dev/net/tun

you the man, thank you

@saltydk
Copy link

saltydk commented Dec 10, 2024

For me, adding this does NOT solve the issue:

    devices:
      - /dev/net/tun:/dev/net/tun

Nor did this:

modprobe tun

The only fix I found (other than rolling back) was this:

    privileged: true

It does if you correctly assigned NET_ADMIN as per the docs.

@the-hotmann
Copy link

I still wonder, why suddenly mapping the device in, is neccesary?
I already gave the docker the permission/capability:

    cap_add:
      - NET_ADMIN

so it should (like before) be able to handle the rest automatically.

@saltydk
Copy link

saltydk commented Dec 11, 2024

I still wonder, why suddenly mapping the device in, is neccesary? I already gave the docker the permission/capability:

    cap_add:
      - NET_ADMIN

so it should (like before) be able to handle the rest automatically.

Because of the change made in containerd (actually the runc dependency 2 years ago but was recently included in containerd), as per the comments in this issue.

@Sharpie
Copy link

Sharpie commented Dec 12, 2024

This also appears to affect k3s v1.31.3+k3s1. Downgrading to v1.31.2+k3s1 fixes the issue.

containerd/containerd#11078 (comment) handwaves at using a "generic device plugin", but I wasn't able to figure out how to draw the rest of the owl on that one over morning coffee.

@qdm12
Copy link
Owner

qdm12 commented Dec 18, 2024

That was addressed a few months ago, I even added code to suggest adding --device /dev/net/tun in f1f3472 but that's not included in the v3.39.1 release unfortunately (it's in the latest image). I'm a bit overworked currently, but I'll do a new overdue v3.40.0 release sometime soon.

Also the wiki already contained the --device /dev/net/tun everywhere for a few months, due to that future change that we now see happening.... unfortunately there was a large scale type where it was /dev/net/run so that did not help, fixed today in qdm12/gluetun-wiki@bb23197

Please if your issue is NOT resolved by adding --device /dev/net/tun in your configuration, open a separate issue from here. I'll leave this open for a few more days for users to read this comment but it will get closed then.

@qdm12 qdm12 added Category: Documentation ✒️ A problem with the readme or a code comment. Status: 🟡 Nearly resolved This might be resolved or is about to be resolved labels Dec 18, 2024
@oz-glenn
Copy link

This also appears to affect k3s v1.31.3+k3s1. Downgrading to v1.31.2+k3s1 fixes the issue.

containerd/containerd#11078 (comment) handwaves at using a "generic device plugin", but I wasn't able to figure out how to draw the rest of the owl on that one over morning coffee.

Thanks! This explained my unexplainable problem that I'd spent a good few days trying to track down. The downgrade caused other issues but now the cluster is back to normal and gluetun is running!

@Sharpie
Copy link

Sharpie commented Dec 19, 2024

Thanks! This explained my unexplainable problem that I'd spent a good few days trying to track down.

Glad that helped! Kubernetes does not appear to have an equivalent of the --device flag, so the solution added to the error message and wiki won't apply to Pods.

I may take another run at upgrading my cluster over the holiday. I'll post another update if I figure something out and don't get lost shaving yaks.

@kldzj
Copy link

kldzj commented Dec 19, 2024

@Sharpie any idea why this wouldn't work on v1.29.11+k3s1?

@Sharpie
Copy link

Sharpie commented Dec 19, 2024

@kldzj Looks like v1.29.11+k3s1 was released on the same day as v1.31.3+k3s1, so the likely explanation is that it contains the same bump to containerd or runc that carries the breaking change of not sharing /dev/net/tun by default.

@Sharpie
Copy link

Sharpie commented Dec 19, 2024

Yup, runc is the problem. k3s releases prior to 2 weeks ago were using v1.1.14 and then updated to v1.2.1 two weeks ago.

containerd/containerd#11078 (comment) states that the change of not sharing /dev/net/tun went into v1.2 of runc, so any k3s release using a runc version newer than v1.1.x will be impacted.

@Sharpie
Copy link

Sharpie commented Dec 21, 2024

I did some research and testing with K3D and came up with 3 ways to deal with the following error on Kubernetes:

ERROR creating tun device: unix opening TUN device file: operation not permitted

TL/DR: Hold off on upgrading Kubernetes until runc v1.2.4 is in use, or run the Gluetun container with privileged: true.

Reproduction Case

I used the following Deployment which starts Gluetun as a SidecarContainer and then boots a netshoot container that retrieves connection info (NOTE: the proton-wireguard secret is something you will have to provide to complete the test case):

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: gluetun-example
  name: gluetun-example
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gluetun-example
  template:
    metadata:
      labels:
        app: gluetun-example
    spec:
      initContainers:
        - name: gluetun
          image: 'qmcgaw/gluetun:v3.39.1'
          restartPolicy: Always
          env:
            - name: VPN_SERVICE_PROVIDER
              value: custom
            - name: VPN_TYPE
              value: wireguard
            - name: WIREGUARD_ADDRESSES
              value: '10.2.0.2/32'
            - name: VPN_ENDPOINT_PORT
              value: '51820'
            - name: WIREGUARD_PRIVATE_KEY
              valueFrom:
                secretKeyRef:
                  name: proton-wireguard
                  key: wireguard-privatekey
            - name: VPN_ENDPOINT_IP
              valueFrom:
                secretKeyRef:
                  name: proton-wireguard
                  key: wireguard-peer-endpoint
            - name: WIREGUARD_PUBLIC_KEY
              valueFrom:
                secretKeyRef:
                  name: proton-wireguard
                  key: wireguard-peer-publickey
          securityContext:
            capabilities:
              add:
                - NET_ADMIN
          startupProbe:
            exec:
              command:
                - /gluetun-entrypoint
                - healthcheck
            initialDelaySeconds: 10
            timeoutSeconds: 5
            periodSeconds: 5
            failureThreshold: 3
          livenessProbe:
            exec:
              command:
                - /gluetun-entrypoint
                - healthcheck
            timeoutSeconds: 5
            periodSeconds: 5
            failureThreshold: 3
      containers:
        - name: netshoot
          image: nicolaka/netshoot
          command:
            - /bin/sh
            - '-c'
            - |
              while true; do
                curl -sS https://am.i.mullvad.net/json | jq
                sleep 60
              done

When added to a cluster running k3s v1.31.2-k3s1:

k3d cluster create --image rancher/k3s:v1.31.2-k3s1 dot-2

Everything works:

# kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
gluetun-example-778cb664dc-h2l7k   2/2     Running   0          25s

# kubectl logs deployment/gluetun-example
Defaulted container "netshoot" out of: netshoot, gluetun (init)
{
  "ip": "149.88.27.182",
  "country": "Switzerland",
  "city": "Zurich",
  "longitude": 8.5671,
  "latitude": 47.3682,
  "mullvad_exit_ip": false,
  "blacklisted": {
    "blacklisted": false,
    "results": []
  },
  "organization": "Datacamp Limited"
}

When added to a cluster running k3s v1.31.4-k3s1:

k3d cluster create --image rancher/k3s:v1.31.4-k3s1 dot-4

The pod fails to complete initialization due to TUN device permissions:

# kubectl get pods
NAME                               READY   STATUS       RESTARTS   AGE
gluetun-example-778cb664dc-nd4qh   0/2     Init:Error   0          3s

# kubectl logs deployment/gluetun-example -c gluetun -p
...
2024-12-21T14:40:49Z ERROR creating tun device: unix opening TUN device file: operation not permitted
2024-12-21T14:40:49Z INFO Shutdown successful

Solutions

Don't upgrade Kubernetes until runc v1.2.4 is in use

The runc maintainers have reverted the removal of /dev/net/tun and this change is scheduled to go out in the v1.2.4 release:

opencontainers/runc#4556

Future Kubernetes releases that use runc v1.2.4 or newer should Just Work As They Used To™

  • Pros: No changes required.
  • Cons: runc v1.2.4 is not available yet and will likely take several months to show up in Kubernetes patch releases. You may not be able to delay patching, or may have already upgraded. Runc may stop sharing /dev/net/tun again in the distant future.

Run Gluetun in privileged mode

Update Gluetun containers to run in privileged mode (shoutout to @holysoles for adding this to the wiki):

diff --git a/gluetun-deployment.yaml b/gluetun-deployment.yaml
index c509daa..e1be491 100644
--- a/gluetun-deployment.yaml
+++ b/gluetun-deployment.yaml
@@ -43,6 +43,7 @@ spec:
                   name: proton-wireguard
                   key: wireguard-peer-publickey
           securityContext:
+            privileged: true
             capabilities:
               add:
                 - NET_ADMIN
  • Pros: Small change.
  • Cons: You now have more containers running with elevated privileges.

Use generic-device-plugin to manage access to /dev/net/tun

Deploy the Generic Device Plugin configured to manage access to /dev/net/tun on k8s nodes:

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: generic-device-plugin
  namespace: kube-system
  labels:
    app.kubernetes.io/name: generic-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: generic-device-plugin
  template:
    metadata:
      labels:
        app.kubernetes.io/name: generic-device-plugin
    spec:
      priorityClassName: system-node-critical
      tolerations:
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "NoSchedule"
      containers:
      - image: squat/generic-device-plugin
        # count: 1024 is arbitrary, but will limit each k8s node
        # to only running 1024 containers that use /dev/net/tun
        args:
        - --device
        - |
          name: tun
          groups:
            - count: 1024
              paths:
                - path: /dev/net/tun
        name: generic-device-plugin
        resources:
          requests:
            cpu: 50m
            memory: 10Mi
          limits:
            cpu: 50m
            memory: 20Mi
        ports:
        - containerPort: 8080
          name: http
        securityContext:
          privileged: true
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
        - name: dev
          mountPath: /dev
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins
      - name: dev
        hostPath:
          path: /dev
  updateStrategy:
    type: RollingUpdate

Then, update Gluetun containers to request squat.ai/tun as a resource:

diff --git a/gluetun-deployment.yaml b/gluetun-deployment.yaml
index c509daa..5826d22 100644
--- a/gluetun-deployment.yaml
+++ b/gluetun-deployment.yaml
@@ -42,6 +42,9 @@ spec:
                 secretKeyRef:
                   name: proton-wireguard
                   key: wireguard-peer-publickey
+          resources:
+            limits:
+              squat.ai/tun: "1"
           securityContext:
             capabilities:
               add:
  • Pros: Avoids the need to run Gluetun in privileged mode. Can also be used to manage access to other devices (FUSE, USB, etc.)
  • Cons: Requires a 3rd-party container running in privileged mode on all k8s nodes.

@qdm12
Copy link
Owner

qdm12 commented Dec 25, 2024

Fyi v3.40.0 is released containing the warning mentioned. I'll leave this opened especially for the comment above ⬆️ to be added to the wiki temporarily (btw thanks for the investigation and sharing!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Documentation ✒️ A problem with the readme or a code comment. Status: 🟡 Nearly resolved This might be resolved or is about to be resolved
Projects
None yet
Development

No branches or pull requests