Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: provide realistic runAsNonRoot security context values for fluent-bit #330

Open
joebowbeer opened this issue Mar 28, 2023 · 9 comments

Comments

@joebowbeer
Copy link

Provide realistic values for running fluent-bit as a non-root user.

The security context comments in values.yaml are not usable:

podSecurityContext: {}
#   fsGroup: 2000

securityContext: {}
#   capabilities:
#     drop:
#     - ALL
#   readOnlyRootFilesystem: true
#   runAsNonRoot: true
#   runAsUser: 1000

Issues:

  1. The user and group ids do not exist in the fluent-bit image. AFAICT the image is based on distroless/cc-debian11 which runs as root - though it does define a nonroot user id (65532:65532).
  2. All the files in the image are owned by 0:0 (root) so runAsNonRoot probably won't suffice, at least not without some additional capabilities, such as FOWNER
  3. Typical deployments will enable storage.path (e.b., /var/fluent-bit/state/flb-storage/), which appears to need a hostPath

Related:

@joebowbeer joebowbeer changed the title RFE: provide realistic runAsNonRoot pod security policy values for fluent-bit RFE: provide realistic runAsNonRoot security context values for fluent-bit Mar 28, 2023
@razorsk8jz
Copy link

razorsk8jz commented Mar 29, 2023

I was able to get aws-for-fluent-bit running with the following permissions - I have not seen any issues yet but will let you know if I do. I was also unnable to get running with nonroot as it does not appear fluent-bit can run unless running as user 0

podSecurityContext:
  runAsUser: 0
  seccompProfile:
    type: RuntimeDefault
containerSecurityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  privileged: false
  capabilities:
    drop:
    - ALL

@maurosls
Copy link

So running as a Non-Root user isn`t an option at the moment? Can we confirm this?

@pentago
Copy link

pentago commented Feb 29, 2024

I'd love to be able to tune securityContext for running process as non-root, most importantly, the non-root/nobody user present in distroless image.

Does this potential feature disallow fluent-bit from reading log files or there's additional complexity I'm not aware of?

@gsmith-sas
Copy link

I think I have been able to get Fluent Bit running as a non-root user AND still use a hostPath volume for the tail database and buffering. But I'd like some feedback on my approach in case I'm missing something.

Implementing this required 3 sets of changes to the Fluent Bit Helm chart.

  • Added a securityContext
securityContext:
  runAsUser: 3301
  fsGroup: 3301
  readOnlyRootFilesystem: true
  privileged: false
  capabilities:
    drop: ["ALL"]
    add: ["FOWNER"]`
  • Added an extra volume/mount
extraVolumeMounts:
##Existing volume mounts for parsers, etc. omitted
- mountPath: /var/log/fb-storage
  name: fb-storage
  readOnly: false

extraVolumes:
- hostPath:
    path: /var/log/fb-storage
    type: DirectoryOrCreate
  name: fb-storage
  • Added an initContainer to change owner/group on mounted volume
initContainers:
- name: chowner-fb-storage
  image: registry.hub.docker.com/library/alpine:3.12.0
  command: ["chown", "3301:3301", "/var/log/fb-storage"]
  securityContext:
    readOnlyRootFilesystem: true
    capabilities:
      drop: ["all"]
      add: ["CHOWN"]
    runAsUser: 0
    runAsNonRoot: false
  volumeMounts:
  - name: fb-storage
    mountPath: /var/log/fb-storage

In my Fluent Bit configuration, I just pointed to the mounted volume in the storage.path parameter in the [SERVICE] station and in the DB parameter of the [INPUT] filter definitions for the 'tail' filters.

Fluent Bit has been running in this configuration for the last few hours without any problems as far as I can tell. Log messages are being collected and forwarded onto their destination (OpenSearch) with no obvious regression in the number of log messages processed. The Fluent Bit pod logs don't show any new ERROR or WARNING messages.

I've SSH'ed onto the Kubernetes nodes and things look "right":

[root@k8s-n1 /]# ps -ef|grep fluent
3301     1960622 1960497  2 17:46 ?        00:00:34 /fluent-bit/bin/fluent-bit --workdir=/fluent-bit/etc --config=/fluent-bit/etc/conf/fluent-bit.conf

[root@k8s-n1 /]# ls -l /var/log
{snip}
drwxr-xr-x   4   3301   3301        93 Apr  5 17:46 fb-storage

[root@k8s-n1 /]# ls -l /var/log/fb-storage/
total 4148
drwxr-xr-x 2 3301 root       6 Apr  5 18:11 tail.1
drwxr-xr-x 2 3301 root       6 Apr  5 18:11 tail.2
-rw-r--r-- 1 3301 root   20480 Apr  5 18:08 fb.db
-rw-r--r-- 1 3301 root   32768 Apr  5 18:11 fb.db-shm
-rw-r--r-- 1 3301 root 4120032 Apr  5 18:11 fb.db-wal

Hmmm, just noticed that the files within the FB storage directory are owned by user '3301' but the group is 'root'. I thought the fsGroup in the securityContext would have forced that to set the group to '3301'. But I think I can live with that.

Anyone see something wrong about this approach? Any hidden things I may be missing?

NOTE: I'm working with Fluent Bit 2.2.2 and Fluent Bit Helm chart version 0.43.0.

@joebowbeer If you get some time, please give this a try and see if it works for you.
@PettitWesley Not sure if this would work with the AWS version of Fluent Bit and Helm chart. Let us know if you get a chance to try it out.

@onap4105
Copy link

I think I have been able to get Fluent Bit running as a non-root user AND still use a hostPath volume for the tail database and buffering. But I'd like some feedback on my approach in case I'm missing something.

Implementing this required 3 sets of changes to the Fluent Bit Helm chart.

  • Added a securityContext
securityContext:
  runAsUser: 3301
  fsGroup: 3301
  readOnlyRootFilesystem: true
  privileged: false
  capabilities:
    drop: ["ALL"]
    add: ["FOWNER"]`
  • Added an extra volume/mount
extraVolumeMounts:
##Existing volume mounts for parsers, etc. omitted
- mountPath: /var/log/fb-storage
  name: fb-storage
  readOnly: false

extraVolumes:
- hostPath:
    path: /var/log/fb-storage
    type: DirectoryOrCreate
  name: fb-storage
  • Added an initContainer to change owner/group on mounted volume
initContainers:
- name: chowner-fb-storage
  image: registry.hub.docker.com/library/alpine:3.12.0
  command: ["chown", "3301:3301", "/var/log/fb-storage"]
  securityContext:
    readOnlyRootFilesystem: true
    capabilities:
      drop: ["all"]
      add: ["CHOWN"]
    runAsUser: 0
    runAsNonRoot: false
  volumeMounts:
  - name: fb-storage
    mountPath: /var/log/fb-storage

In my Fluent Bit configuration, I just pointed to the mounted volume in the storage.path parameter in the [SERVICE] station and in the DB parameter of the [INPUT] filter definitions for the 'tail' filters.

Fluent Bit has been running in this configuration for the last few hours without any problems as far as I can tell. Log messages are being collected and forwarded onto their destination (OpenSearch) with no obvious regression in the number of log messages processed. The Fluent Bit pod logs don't show any new ERROR or WARNING messages.

I've SSH'ed onto the Kubernetes nodes and things look "right":

[root@k8s-n1 /]# ps -ef|grep fluent
3301     1960622 1960497  2 17:46 ?        00:00:34 /fluent-bit/bin/fluent-bit --workdir=/fluent-bit/etc --config=/fluent-bit/etc/conf/fluent-bit.conf

[root@k8s-n1 /]# ls -l /var/log
{snip}
drwxr-xr-x   4   3301   3301        93 Apr  5 17:46 fb-storage

[root@k8s-n1 /]# ls -l /var/log/fb-storage/
total 4148
drwxr-xr-x 2 3301 root       6 Apr  5 18:11 tail.1
drwxr-xr-x 2 3301 root       6 Apr  5 18:11 tail.2
-rw-r--r-- 1 3301 root   20480 Apr  5 18:08 fb.db
-rw-r--r-- 1 3301 root   32768 Apr  5 18:11 fb.db-shm
-rw-r--r-- 1 3301 root 4120032 Apr  5 18:11 fb.db-wal

Hmmm, just noticed that the files within the FB storage directory are owned by user '3301' but the group is 'root'. I thought the fsGroup in the securityContext would have forced that to set the group to '3301'. But I think I can live with that.

Anyone see something wrong about this approach? Any hidden things I may be missing?

NOTE: I'm working with Fluent Bit 2.2.2 and Fluent Bit Helm chart version 0.43.0.

@joebowbeer If you get some time, please give this a try and see if it works for you. @PettitWesley Not sure if this would work with the AWS version of Fluent Bit and Helm chart. Let us know if you get a chance to try it out.

Hello,

Could you please confirm if the solution has undergone testing and validation? or any other solutions for this issue?

Thank you.

@PettitWesley
Copy link

@onap4105 I think I've tried something equivalent to this before, except I ran the chown command via ssh/exec and it did not work.

@onap4105
Copy link

@onap4105 I think I've tried something equivalent to this before, except I ran the chown command via ssh/exec and it did not work.

Thank you @PettitWesley

@gsmith-sas
Copy link

@PettitWesley I wonder if you ran into a timing issue: the pod has to be up and running before you can ssh/exec into it; wouldn't Fluent Bit have already come up and failed (due to file permissions) before you ssh'ed in and had a chance to change the file permissions? Or, is it possible that the issue was caused by differences between the AWS version of Fluent Bit and (non-AWS) Fluent Bit?

I continued to play around with my approach after posting this and Fluent Bit continued to work as expected/desired for several days. I believe I was even able to remove the grant back of the FOWNER capability in the securityContext. So, from my week or two of testing, this approach seems to work. I've held off of moving to this in a more production environment hoping to get some feedback, preferable validation (or clear evidence of problems), from the wider Fluent Bit community. It's always helpful to have someone completely new try things out.

@onap4105 I'm just a Fluent Bit user so I can't offer official support or validation. Give it a try and let us know whether it works in your use-case. Thanks.

@onap4105
Copy link

onap4105 commented May 1, 2024

@gsmith-sas Below are my changes and the initial results based on your suggestions. I am still verifying and understanding the outcomes. Please let me know if you have any advice.

I used https://github.com/fluent/fluent-operator/releases/tag/v2.8.0

  • changes in values.yaml
  # initContainers test run as non root user
  initContainers:
    - name: chowner-fb-storage
      image: registry.hub.docker.com/library/alpine:3.12.0
      command: ["chown", "3301:3301", "/fluent-bit"]
      securityContext:
        readOnlyRootFilesystem: true
        capabilities:
          drop: ["all"]
          add: ["CHOWN"]
        runAsUser: 0
        runAsNonRoot: false
      volumeMounts:
      - name: positions
        mountPath: /fluent-bit

# Note: I think this is hardcoded in the fluent-bit image, I use it instead of creating a new fb-storage.
Volumes:
  positions:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/fluent-bit/
    HostPathType:
  • Helm install output and status
$ helm install fluent-operator -n fluentbit ./fluent-operator/
W0430 21:57:57.912852   19520 warnings.go:70] unknown field "spec.securityContext.capabilities"
W0430 21:57:57.912852   19520 warnings.go:70] unknown field "spec.securityContext.privileged"
W0430 21:57:57.912852   19520 warnings.go:70] unknown field "spec.securityContext.readOnlyRootFilesystem"
Error: INSTALLATION FAILED: failed to refresh resource information: fluentbits.fluentbit.fluent.io "fluent-bit" not found

$ helm list -n fluentbit
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
fluent-operator fluentbit       1               2024-04-30 21:57:43.0906769 -0400 EDT   failed          fluent-operator-2.8.0   2.8.0
  • fluent-operator and fluent-bit deployment/daemonset are up running.
$ kubectl get all -n fluentbit
NAME                                             READY   STATUS    RESTARTS   AGE
pod/fluent-bit-8sdnh                             1/1     Running   0          9h
pod/fluent-bit-9xgm2                             1/1     Running   0          9h
pod/fluent-bit-dtqw9                             1/1     Running   0          9h
pod/fluent-bit-fdm9f                             1/1     Running   0          9h
pod/fluent-bit-g54tw                             1/1     Running   0          9h
pod/fluent-bit-t7dw9                             1/1     Running   0          9h
pod/fluent-bit-vk27g                             1/1     Running   0          9h
pod/fluent-bit-wlhvz                             1/1     Running   0          9h
pod/fluent-bit-xx5g4                             1/1     Running   0          9h
pod/fluent-operator-5d466549cb-s8cn6             1/1     Running   0          9h

NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/fluent-bit   ClusterIP   x.x.x.x          <none>        2020/TCP   9h

NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/fluent-bit   9         9         9       9            9           <none>          9h

NAME                                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/fluent-operator             1/1     1            1           9h

NAME                                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/fluent-operator-5d466549cb             1         1         1       9h
  • logs (I see kubernetes logs from [Output] stdout)
$ kubectl logs -n fluentbit fluent-bit-wlhvz
level=info time=2024-05-01T01:58:00Z msg="fluent-bit started"
Fluent Bit v2.2.2
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

____________________
< Fluent Bit v2.2.2 
 -------------------
          \
           \
            \          __---__
                    _-       /--______
               __--( /     \ )XXXXXXXXXXX\v.
             .-XXX(   O   O  )XXXXXXXXXXXXXXX-
            /XXX(       U     )        XXXXXXX\
          /XXXXX(              )--_  XXXXXXXXXXX\
         /XXXXX/ (      O     )   XXXXXX   \XXXXX\
         XXXXX/   /            XXXXXX   \__ \XXXXX
         XXXXXX__/          XXXXXX         \__---->
 ---___  XXX__/          XXXXXX      \__         /
   \-  --__/   ___/\  XXXXXX            /  ___--/=
    \-\    ___/    XXXXXX              '--- XXXXXX
       \-\/XXX\ XXXXXX                      /XXXXX
         \XXXXXXXXX   \                    /XXXXX/
          \XXXXXX      >                 _/XXXXX/
            \XXXXX--__/              __-- XXXX/
             -XXXXXXXX---------------  XXXXXX-
                \XXXXXXXXXXXXXXXXXXXXXXXXXX/
                  ""VXXXXXXXXXXXXXXXXXXV""

[2024/05/01 01:58:00] [ info] [fluent bit] version=2.2.2, commit=eeea396e88, pid=13
[2024/05/01 01:58:00] [ info] [storage] ver=1.5.1, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/05/01 01:58:00] [ info] [cmetrics] version=0.6.6
[2024/05/01 01:58:00] [ info] [ctraces ] version=0.4.0
[2024/05/01 01:58:00] [ info] [input:systemd:systemd.0] initializing
[2024/05/01 01:58:00] [ info] [input:systemd:systemd.0] storage_strategy='memory' (memory only)
[2024/05/01 01:58:00] [ info] [input:tail:tail.1] initializing
[2024/05/01 01:58:00] [ info] [input:tail:tail.1] storage_strategy='memory' (memory only)
[2024/05/01 01:58:00] [error] [input:tail:tail.1] parser 'cri' is not registered
[2024/05/01 01:58:00] [ info] [filter:kubernetes:kubernetes.1] https=1 host=kubernetes.default.svc port=443
[2024/05/01 01:58:00] [ info] [filter:kubernetes:kubernetes.1]  token updated
[2024/05/01 01:58:00] [ info] [filter:kubernetes:kubernetes.1] local POD info OK
[2024/05/01 01:58:00] [ info] [filter:kubernetes:kubernetes.1] testing connectivity with API server...
[2024/05/01 01:58:00] [ info] [filter:kubernetes:kubernetes.1] connectivity OK
[2024/05/01 01:58:00] [ info] [output:stdout:stdout.0] worker #0 started
[2024/05/01 01:58:00] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2024/05/01 01:58:00] [ info] [sp] stream processor started
  • inside fluent-bit pod
$ id
uid=3301 gid=0(root) groups=0(root),3301

$ ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
3301           1  0.0  0.0 711144 11944 ?        Ssl  01:58   0:00 /fluent-bit/bin/fluent-bit-watcher
3301          13  0.2  0.0 120000 45676 ?        Sl   01:58   1:24 /fluent-bit/bin/fluent-bit --enable-hot-reload -c /fluent-bit/etc/f3301 

$ ls -lrt / | grep fluent
drwxr-xr-x   1 root root 4096 May  1 01:57 fluent-bit

$ ls -lrt /fluent-bit
total 16
drwxr-xr-x 2 root root 4096 Jan 14 16:22 log
drwxr-xr-x 1 root root 4096 Feb 18 07:53 etc
drwxr-xr-x 1 root root 4096 Feb 18 07:53 bin
drwxrwsrwt 3 root 3301  180 May  1 01:57 config
drwxr-xr-x 2 3301 3301 4096 May  1 01:57 tail

$ ls -lrt ./tail
total 4084
-rw-r--r-- 1 3301 root    8192 May  1 01:58 systemd.db
-rw-r--r-- 1 3301 root   16384 May  1 11:22 pos.db
-rw-r--r-- 1 3301 root   32768 May  1 12:21 pos.db-shm
-rw-r--r-- 1 3301 root 4120032 May  1 12:21 pos.db-wal
  • VM status
/var/lib# ls -lrt | grep fluent
drwxr-xr-x  2  3301  3301 4096 May  1 01:57 fluent-bit


/var/lib/fluent-bit# ls -lrt
total 4088
-rw-r--r-- 1 3301 root    8192 May  1 01:57 systemd.db
-rw-r--r-- 1 3301 root   24576 May  1 02:04 pos.db
-rw-r--r-- 1 3301 root   32768 May  1 02:05 pos.db-shm
-rw-r--r-- 1 3301 root 4120032 May  1 02:05 pos.db-wal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants