Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluentbit in_docker crash #5149

Closed
nitrogene opened this issue Mar 22, 2022 Discussed in #5092 · 8 comments
Closed

Fluentbit in_docker crash #5149

nitrogene opened this issue Mar 22, 2022 Discussed in #5092 · 8 comments
Labels

Comments

@nitrogene
Copy link

Discussed in #5092

Originally posted by nitrogene March 16, 2022
Hello,

I have installed td-agent-bit on wsl2/ubuntu, and have troubles to make the in_docker plugin works.

Here's an extract of td-agent-bit.conf

[INPUT]
# https://docs.fluentbit.io/manual/pipeline/inputs/docker-metrics
    Name   docker
    Tag    docker.metrics

[INPUT]
# https://docs.fluentbit.io/manual/pipeline/inputs/docker-events
    Name   docker_events
    Tag    docker.events


[FILTER]
# https://docs.fluentbit.io/manual/pipeline/filters/record-modifier
    Name record_modifier
    Match *
    Record hostname ${HOSTNAME}

[OUTPUT]
    Name forward
    Match *
    Host 127.0.0.1
    Port 24224
    tls On
    tls.verify On
    tls.ca_file /etc/certs/graylog/certs/ca.crt.pem
    tls.crt_file /etc/certs/graylog/certs/client.crt.pem
    tls.key_file /etc/certs/graylog/private/client.key.pem
    tls.key_passwd ${TLS_PRIVATE_KEY_PASSPHRASE}
    Shared_Key  ${SHARED_KEY}

And as soon as I start the agent:

Fluent Bit v1.9.0
* Copyright (C) 2015-2021 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/03/16 16:01:55] [ info] [engine] started (pid=7746)
[2022/03/16 16:01:55] [ info] [storage] version=1.1.6, initializing...
[2022/03/16 16:01:55] [ info] [storage] in-memory
[2022/03/16 16:01:55] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/03/16 16:01:55] [ info] [cmetrics] version=0.3.0
[2022/03/16 16:01:55] [ warn] [input:thermal:thermal.4] thermal device file not found
[2022/03/16 16:01:55] [ info] [input:docker_events:docker_events.6] listening for events on /var/run/docker.sock
[2022/03/16 16:01:55] [ info] [sp] stream processor started
[2022/03/16 16:01:56] [ info] [output:forward:forward.0] worker #0 started
[2022/03/16 16:01:56] [ info] [output:forward:forward.0] worker #1 started
[2022/03/16 16:01:56] [error] [plugins/in_docker/docker.c:315 errno=2] No such file or directory
[2022/03/16 16:01:56] [engine] caught signal (SIGSEGV)
[2022/03/16 16:01:56] [error] [input:docker:docker.5] error gathering CPU data from /sys/fs/cgroup/cpu/docker/c6947781249f695c7811ce5da932b2aa79f28ea95b89c0a12b9eb841fa28302e/cpuacct.usage
#0  0x558ead4e587c      in  flush_snapshot() at plugins/in_docker/docker.c:701
#1  0x558ead4e5a03      in  flush_snapshots() at plugins/in_docker/docker.c:728
#2  0x558ead4e5c21      in  cb_docker_collect() at plugins/in_docker/docker.c:798
#3  0x558ead482141      in  flb_input_collector_fd() at src/flb_input.c:1203
#4  0x558ead4986da      in  flb_engine_handle_event() at src/flb_engine.c:439
#5  0x558ead4986da      in  flb_engine_start() at src/flb_engine.c:761
#6  0x558ead474023      in  flb_lib_worker() at src/flb_lib.c:626
#7  0x7f237eba2608      in  ???() at ???:0
#8  0x7f237e4cb162      in  ???() at ???:0
#9  0xffffffffffffffff  in  ???() at ???:0

If I deactivate the in_docker plugin by commenting the relevant lines in the configuration file, it just works => I can see the host metrics (cpu, mem, etc..).

Here's the content of the content of the /sys/fs/cgroup/cpu/docker/c6947781249f695c7811ce5da932b2aa79f28ea95b89c0a12b9eb841fa28302e/ folder :

$ ls /sys/fs/cgroup/cpu/docker/c6947781249f695c7811ce5da932b2aa79f28ea95b89c0a12b9eb841fa28302e/
cgroup.clone_children  cpu.cfs_period_us  cpu.rt_period_us   cpu.shares  notify_on_release
cgroup.procs           cpu.cfs_quota_us   cpu.rt_runtime_us  cpu.stat    tasks

Any idea ?

@nokute78
Copy link
Collaborator

in_docker expects that a kernel supports CPU Accounting Controller
https://www.kernel.org/doc/Documentation/cgroup-v1/cpuacct.txt
The error log indicated cpuacct.usage was not found.

Could you share cat /proc/cgroups and grep CGROUP /boot/config-* logs ?

@nitrogene
Copy link
Author

Hello,

Here are the requested logs - please remember that I am using Ubuntu via WSL2/Windows 10:

~$ cat /proc/cgroups
#subsys_name    hierarchy       num_cgroups     enabled
cpuset  1       1       1
cpu     2       1       1
cpuacct 3       1       1
blkio   4       1       1
memory  5       1       1
devices 6       1       1
freezer 7       1       1
net_cls 8       1       1
perf_event      9       1       1
net_prio        10      1       1
hugetlb 11      1       1
pids    12      1       1
rdma    13      1       1
~$ grep CGROUP /boot/config-*
grep: /boot/config-*: No such file or directory

In the meantime, I found a workaround - probably an ugly one. Before launching the fluent bit agent in a wsl2 shell, I do the following:

# hack-start
sudo umount /sys/fs/cgroup/cpu
sudo mount -t cgroup -ocpuacct none /sys/fs/cgroup/cpu
# hack-end

With this hack, the agent is able to rune fine - but to be honest, I don't know the consequences of this hack.

Best regards,

Jean-Pierre

@nokute78
Copy link
Collaborator

@nitrogene Thank you for logs.

I also think it is mount issue.
Docker releases checking config script.
https://github.com/moby/moby/blob/master/contrib/check-config.sh

It links to https://github.com/tianon/cgroupfs-mount
https://github.com/moby/moby/blob/master/contrib/check-config.sh#L194

How about running check-config.sh ?

Note: I sent a patch #5189 .
It is to prevent SIGSEGV not for mounting issue.
It will not fix this issue since in_docker can't gather metrics even if the patch is merged.

@nitrogene
Copy link
Author

Hello,

Here's the output of check_config.sh:

$ ./check_config.sh
info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled
- CONFIG_IP_NF_FILTER: enabled
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled
- CONFIG_NETFILTER_XT_MARK: enabled
- CONFIG_IP_NF_NAT: enabled
- CONFIG_NF_NAT: enabled
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_CGROUP_BPF: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_SECCOMP_FILTER: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
    (cgroup swap accounting is currently enabled)
- CONFIG_LEGACY_VSYSCALL_NONE: enabled
    (containers using eglibc <= 2.13 will not work. Switch to
     "CONFIG_VSYSCALL_[NATIVE|EMULATE]" or use "vsyscall=[native|emulate]"
     on kernel command line. Note that this will disable ASLR for the,
     VDSO which may assist in exploiting security vulnerabilities.)
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: missing
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_NF_TARGET_REDIRECT: enabled
- CONFIG_IP_VS: enabled
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled
- CONFIG_SECURITY_SELINUX: missing
- CONFIG_SECURITY_APPARMOR: missing
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled
    - CONFIG_BRIDGE_VLAN_FILTERING: enabled
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: enabled
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled
  - "ipvlan":
    - CONFIG_IPVLAN: enabled
  - "macvlan":
    - CONFIG_MACVLAN: enabled
    - CONFIG_DUMMY: enabled
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled
    - CONFIG_NF_CONNTRACK_FTP: enabled
    - CONFIG_NF_NAT_TFTP: enabled
    - CONFIG_NF_CONNTRACK_TFTP: enabled
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: missing
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled
    - CONFIG_DM_THIN_PROVISIONING: enabled
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

Best regards,

Jean-Pierre

@nokute78
Copy link
Collaborator

nokute78 commented Apr 2, 2022

Did you run config_check.sh before executing below workaround ?
#5149 (comment)

@nitrogene
Copy link
Author

nitrogene commented Apr 4, 2022

Hello,

I have some doubt, so I ran check-config.sh again, in a fresh wsl2 shell, just after having restarted my computer:

$ ./check-config.sh
info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled
- CONFIG_IP_NF_FILTER: enabled
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled
- CONFIG_NETFILTER_XT_MARK: enabled
- CONFIG_IP_NF_NAT: enabled
- CONFIG_NF_NAT: enabled
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_CGROUP_BPF: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_SECCOMP_FILTER: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
    (cgroup swap accounting is currently enabled)
- CONFIG_LEGACY_VSYSCALL_NONE: enabled
    (containers using eglibc <= 2.13 will not work. Switch to
     "CONFIG_VSYSCALL_[NATIVE|EMULATE]" or use "vsyscall=[native|emulate]"
     on kernel command line. Note that this will disable ASLR for the,
     VDSO which may assist in exploiting security vulnerabilities.)
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: missing
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_NF_TARGET_REDIRECT: enabled
- CONFIG_IP_VS: enabled
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled
- CONFIG_SECURITY_SELINUX: missing
- CONFIG_SECURITY_APPARMOR: missing
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled
    - CONFIG_BRIDGE_VLAN_FILTERING: enabled
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: enabled
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled
  - "ipvlan":
    - CONFIG_IPVLAN: enabled
  - "macvlan":
    - CONFIG_MACVLAN: enabled
    - CONFIG_DUMMY: enabled
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled
    - CONFIG_NF_CONNTRACK_FTP: enabled
    - CONFIG_NF_NAT_TFTP: enabled
    - CONFIG_NF_CONNTRACK_TFTP: enabled
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: missing
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled
    - CONFIG_DM_THIN_PROVISIONING: enabled
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

Regards,

Jean-Pierre

@github-actions
Copy link
Contributor

github-actions bot commented Jul 4, 2022

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Jul 4, 2022
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants