Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluentd logs is full of backslash and kibana doesn't show k8s pods logs #2545

Open
avarf opened this issue Aug 6, 2019 · 17 comments
Open

Fluentd logs is full of backslash and kibana doesn't show k8s pods logs #2545

avarf opened this issue Aug 6, 2019 · 17 comments
Labels
bug Something isn't working

Comments

@avarf
Copy link

avarf commented Aug 6, 2019

Describe the bug
I set up an EFK stack for gathering my different k8s pods logs based on this tutorial: https://mherman.org/blog/logging-in-kubernetes-with-elasticsearch-Kibana-fluentd/ on a Microk8s single node cluster. Everything is up and working and I can connect kibanna to elasticsearch and see the indexes but in the discovery section of kibana there is no log related to my pods and there are kubelete logs.

When I checked the logs of fluentd I saw that it is full of backslashes:

2019-08-05 15:23:17 +0000 [warn]: #0 [in_tail_container_logs] pattern not match: "2019-08-05T17:23:10.167379794+02:00 stdout P 2019-08-05 15:23:10 +0000 [warn]: #0 [in_tail_container_logs] pattern not match: \"2019-08-05T17:23:07.09726655+02:00 stdout P 2019-08-05 15:23:07 +0000 [warn]: #0 [in_tail_container_logs] pattern not match: \\\"2019-08-05T17:23:04.433817307+02:00 stdout P 2019-08-05 15:23:04 +0000 [warn]: #0 [in_tail_container_logs] pattern not match: \\\\\\\"2019-08-05T17:22:52.546188522+02:00 stdout P 2019-08-05 15:22:52 +0000 [warn]: #0 [in_tail_container_logs] pattern not match: \\\\\\\\\\\\\\\"2019-08-05T17:22:46.694679863+02:00 stdout F \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

There are much more backslashes but I just copied this amount to show the log.

Your Environment

  • Fluentd or td-agent version: I tested this with two images: fluent/fluentd-kubernetes-daemonset:v1.4-debian-elasticsearch and also v1.3 but the results were the same
  • Operating system: I am using Ubuntu 18.04 but the fluentd is running in a container and in a single node kubernetes cluster running on Microk8s

Your Configuration
Based on the tutorial that I mentioned earlier I am using two config files for setting up fluentd:

  1. fluentd-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: fluentd
  namespace: kube-system
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - namespaces
  verbs:
  - get
  - list
  - watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: fluentd
roleRef:
  kind: ClusterRole
  name: fluentd
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: fluentd
  namespace: kube-system
  1. fluentd-daemonset.yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: kube-system
  # namespace: default
  labels:
    k8s-app: fluentd-logging
    version: v1
    kubernetes.io/cluster-service: "true"
spec:
  template:
    metadata:
      labels:
        k8s-app: fluentd-logging
        version: v1
        kubernetes.io/cluster-service: "true"
    spec:
      serviceAccount: fluentd
      serviceAccountName: fluentd
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1.4-debian-elasticsearch
        env:
          - name:  FLUENT_ELASTICSEARCH_HOST
            value: "elasticsearch.logging"
          - name:  FLUENT_ELASTICSEARCH_PORT
            value: "9200"
          - name: FLUENT_ELASTICSEARCH_SCHEME
            value: "http"
          - name: FLUENT_UID
            value: "0"
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
@avarf avarf added the bug Something isn't working label Aug 6, 2019
@repeatedly
Copy link
Member

When I checked the logs of fluentd I saw that it is full of backslashes:

Is this log single line, right? If so, it seems several logs are merged into one.
Do you show the configuration/application example to reproduce the problem?

@avarf
Copy link
Author

avarf commented Aug 8, 2019

No, the log is full of backslashes and there are single lines of actual log and then pages of backslashes but I didn't want to copy all the meaningless backslashes and when I searched for the "error" there wasn't any.
Regarding the configuration, you have all the configuration, I followed that tutorial, used an image and the environment variables that you can see in the yaml files and I ran it on Microk8s and Ubuntu 18.04.

@codesqueak
Copy link

Any progress on this issue ? I seem to have just hit exactly the same problem

I use a slightly different setup using

image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch

but otherwise substantially the same.

Looking at the logs, it appears to be repeatedly reprocessing the same information, objecting to the format, which generates a new, longer log entry which is then reprocessed .... and around we go.

@ebastos
Copy link

ebastos commented Dec 9, 2019

I have the same problem after following this tutorial, but using k3s as my kubernetes deployment.

If I strip the backslashs I can see something like:

# kubectl logs --tail=5 fluentd-48jkv -n kube-logging |tr -s "\\"
tr: warning: an unescaped backslash at end of string is not portable
\"
2019-12-09 20:23:29 +0000 [warn]: #0 [in_tail_container_logs] pattern not match: "2019-12-09T20:23:24.66350503Z stdout F \"\"\""
2019-12-09 20:23:29 +0000 [warn]: #0 [in_tail_container_logs] pattern not match: "2019-12-09T20:23:24.664147887Z stdout P 2019-12-09 20:23:24 +0000 [warn]: #0 [in_tail_container_logs] pattern not match: \"2019-12-09T20:23:21.243596958Z stdout P 2019-12-09 20:23:21 +0000 [warn]: #0 [in_tail_container_logs] pattern not match: \"2019-12-09T20:23:07.807619666Z stdout P 2019-12-09 20:23:07 +0000 [warn]: #0 [in_tail_container_logs] pattern not match: \"2019-12-09T20:23:01.152628152Z stdout F \"

But otherwise it's not even possible to see what is going on:

# kubectl logs --tail=5 fluentd-48jkv -n kube-logging |egrep -o '\\'|wc -l
32650

My fluend.yaml is as follows:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd
  namespace: kube-logging
  labels:
    app: fluentd
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluentd
  labels:
    app: fluentd
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - namespaces
  verbs:
  - get
  - list
  - watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: fluentd
roleRef:
  kind: ClusterRole
  name: fluentd
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: fluentd
  namespace: kube-logging
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: kube-logging
  labels:
    app: fluentd
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      serviceAccount: fluentd
      serviceAccountName: fluentd
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1
        env:
          - name:  FLUENT_ELASTICSEARCH_HOST
            value: "elasticsearch.kube-logging.svc.cluster.local"
          - name:  FLUENT_ELASTICSEARCH_PORT
            value: "9200"
          - name: FLUENT_ELASTICSEARCH_SCHEME
            value: "http"
          - name: FLUENTD_SYSTEMD_CONF
            value: disable
        resources:
          limits:
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

@etegan
Copy link

etegan commented Jan 13, 2020

Same issue. Does anyone have a solution for this?

@neoxack
Copy link

neoxack commented Jan 30, 2020

Same issue \\\\

@johnfedoruk
Copy link

If your fluentd logs are growing in backslashes, then your fluentd container is parsing its own logs and recursively generating new logs.

Consider creating a fluentd-config.yaml file that is setup to ignore /var/log/containers/fluentd* logs. My example here will help you parse Apache logs... RTFM for more information on configuring sources.

Here is my fluentd-config.yaml file:

kind: ConfigMap
apiVersion: v1
metadata:
  name: fluentd-config
  namespace: kube-logging
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
data:
  containers.input.conf: |-
    <source>
      @type tail
      @id in_tail_container_logs
      path /var/log/containers/*.log
      exclude_path ["/var/log/containers/fluentd*"]
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head true
      format /^.* (?<source>(stderr|stdout))\ F\ (?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/
      time_format %d/%b/%Y:%H:%M:%S %z
    </source>
  output.conf: |-
    # Enriches records with Kubernetes metadata
    <filter kubernetes.**>
      type kubernetes_metadata
    </filter>
    <match **>
       type elasticsearch
       log_level info
       include_tag_key true
       host elasticsearch.kube-logging.svc.cluster.local
       port 9200
       logstash_format true
       # Set the chunk limits.
       buffer_chunk_limit 2M
       buffer_queue_limit 8
       flush_interval 5s
       # Never wait longer than 5 minutes between retries.
       max_retry_wait 30
       # Disable the limit on the number of retries (retry forever).
       disable_retry_limit
       # Use multiple threads for processing.
       num_threads 2
    </match>

Then you will want to update your fluentd DaemonSet. I have had success with the gcr.io/google-containers/fluentd-elasticsearch:v2.0.1 image. Attach your fluentd-config to your fluentd DaemonSet.

Here's what that looks like:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: kube-logging
  labels:
    app: fluentd
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      serviceAccount: fluentd
      serviceAccountName: fluentd
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: fluentd
        image: gcr.io/google-containers/fluentd-elasticsearch:v2.0.1
        env:
          - name: FLUENTD_SYSTEMD_CONF
            value: "disable"
          - name: FLUENTD_ARGS
            value: "--no-supervisor -q"
        resources:
          limits:
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlogcontainers
          mountPath: /var/log/containers
          readOnly: true
        - name: config
          mountPath: /etc/fluent/config.d
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlogcontainers
        hostPath:
          path: /var/log/containers/
      - name: config
        configMap:
          name: fluentd-config

Best of luck!

@repeatedly
Copy link
Member

repeatedly commented Mar 12, 2020

@lecaros
Copy link

lecaros commented Oct 20, 2020

I see 2 possible concurrent causes:

  1. You're not excluding fluentd logs (hence the numerous '\' and the circular log messages)
  2. k3s will prefix the log lines with datetime, stream (stdout, stderr) and a log tag. So, if your message is "hello Dolly", k3s will save it to the file as:
    2020-10-20T18:05:39.163671864-05:00 stdout F "hello Dolly"

The pattern not match explains why kibana doesn't see any error message. They're not being sent to your elastic service.

Having a proper filter/parser would help on this.
Can you post your fluentd conf?

@mariusgrigoriu
Copy link

Is there a good way for fluentd's own logs to be shipped if possible?

@mickdewald
Copy link

I got this issue as well, because I was using containerd instead of docker. I solved it by putting in the following configuration:

- name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
  value: /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/

@108356037
Copy link

@micktg
Your solution fixed my problem! Much appreciation!

@repeatedly
Copy link
Member

For lastest images, use cri parser is better than regexp: https://github.com/fluent/fluentd-kubernetes-daemonset#use-cri-parser-for-containerdcri-o-logs

@varungupta19
Copy link

I followed a digital ocean tutorial https://www.digitalocean.com/community/tutorials/how-to-set-up-an-elasticsearch-fluentd-and-kibana-efk-logging-stack-on-kubernetes to setup my EFK for kubernetes and faced the same issue. The above answer by @micktg resolved the issue. I added the below in environment variables of my fluentd yaml file, so now my environment variables look like this

    env:
      - name:  FLUENT_ELASTICSEARCH_HOST
        value: "elasticsearch.kube-logging.svc.cluster.local"
      - name:  FLUENT_ELASTICSEARCH_PORT
        value: "9200"
      - name: FLUENT_ELASTICSEARCH_SCHEME
        value: "http"
      - name: FLUENTD_SYSTEMD_CONF
        value: disable
      - name: FLUENT_CONTAINER_TAIL_EXCLUDE_PATH
        value: /var/log/containers/fluent*
      - name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
        value: /^(?<time>.+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/ 

@rawinng
Copy link

rawinng commented Aug 13, 2021

I found @micktg and @varungupta19 answer solve the problem.

@jsvasquez
Copy link

Thanks, @micktg and @varungupta19. Problem solved.

@Athuliva
Copy link

Athuliva commented Apr 6, 2024

I followed a digital ocean tutorial https://www.digitalocean.com/community/tutorials/how-to-set-up-an-elasticsearch-fluentd-and-kibana-efk-logging-stack-on-kubernetes to setup my EFK for kubernetes and faced the same issue. The above answer by @micktg resolved the issue. I added the below in environment variables of my fluentd yaml file, so now my environment variables look like this

    env:
      - name:  FLUENT_ELASTICSEARCH_HOST
        value: "elasticsearch.kube-logging.svc.cluster.local"
      - name:  FLUENT_ELASTICSEARCH_PORT
        value: "9200"
      - name: FLUENT_ELASTICSEARCH_SCHEME
        value: "http"
      - name: FLUENTD_SYSTEMD_CONF
        value: disable
      - name: FLUENT_CONTAINER_TAIL_EXCLUDE_PATH
        value: /var/log/containers/fluent*
      - name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
        value: /^(?<time>.+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/ 

adding value: /^(?<time>.+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/ this didn't help me. I am trying to implement it in microk8s

openstack-mirroring pushed a commit to openstack/openstack-helm-infra that referenced this issue Jul 9, 2024
+ prevent Fluentd from parsing its own logs and fix an issue with
  endless backslashes (fluent/fluentd#2545)
+ increase chunk limit size
+ add storage for systemd plugin configuration
+ add pos_file parameter for the tail sources

Change-Id: I7d6e54d2324e437c92e5e8197636bd6c54419167
openstack-mirroring pushed a commit to openstack/openstack that referenced this issue Jul 9, 2024
* Update openstack-helm-infra from branch 'master'
  to 01e66933b3c2b93c6677c04a00361ceeb70a9634
  - [fluentd] Adjust configuration for v1.15
    
    + prevent Fluentd from parsing its own logs and fix an issue with
      endless backslashes (fluent/fluentd#2545)
    + increase chunk limit size
    + add storage for systemd plugin configuration
    + add pos_file parameter for the tail sources
    
    Change-Id: I7d6e54d2324e437c92e5e8197636bd6c54419167
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests