Fluentd is skipping logs in kubernetes #366

mohitanchlia · 2019-11-05T00:36:04Z

Describe the bug
We run multiple kubernetes pods on the same node. Under low traffic there is generally not a problem but under high load when we generate 128MB size of logs file every 2 minutes fluentd starts to skip the logs. There are gaps in log ingestion. I tracked the last event and the first event in the logs and they correspond to the last log line of the file and the first log line of the next log file that it picks up, all other log files in between gets skipped. We have tons of CPU power available, ruby process is only at 50%.

We've tried almost all the settings described in other related thread, like watcher settings, rotate wait settings but nothing seems to help.

To Reproduce

 <source>
    @id containers.log
    @type tail
    @label @SPLUNK
    tag tail.containers.*
    path /var/log/containers/*.log
    refresh_interval 5s
    pos_file /var/log/splunk-fluentd-containers.log.pos
    path_key source
    read_from_head true
    rotate_wait 10s
    <parse>
      @type json
      time_key time
      time_type string
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      localtime false
    </parse>
  </source>

Have 2 similar pods on a 8 core node, drive load to generate 128 MB file size very 2 minutes. At this time you'll see POD A logs are ingested but no logs from POD B, and then after few mts POD B logs are ingested but no logs from POD A, and this behavior continues.

***** Only thing that worked is if I hard code individual containers in multiple tail input sections using multiple workers. This solution is not a viable solution because container ids are not known ahead of time.

Expected behavior
All the logs are forwarded without skipping them
Your Environment
fluentd 1.4
Amazon Linux 2

Fluentd or td-agent version: fluentd --version or td-agent --version
Operating system: cat /etc/os-release
Kernel version: uname -r
If you hit the problem with older fluentd version, try latest version first.

Your Configuration

Your Error Log
No errors are noted. No warnings as well

The text was updated successfully, but these errors were encountered:

ganmacs · 2019-11-05T02:16:16Z

this one? fluent/fluentd#2236 (comment)

repeatedly · 2019-11-05T02:35:42Z

I want to know this problem is k8s specific or not.
If you can reproduce this problem without k8s, could you write reproducible steps/actual example?

mohitanchlia · 2019-11-05T18:15:33Z

@repeatedly I can try and reproduce, however the fact that multiple tail sources with multiple workers work just fine leads me to believe fluentd is not able to keep up. However, ruby CPU is only 50%. It looks like with multiple pods it is time slicing the reads instead of using multiple threads to read every log file?

mohitanchlia · 2019-11-12T22:22:52Z

Can I get some guidance on how to reproduce this in a standalone environment? We use splunk to visualize so it's easy to see when logs are skipped or not forwarded. We are pumping > 12K log lines per node per fluentd instance.

repeatedly · 2019-11-13T12:49:24Z

To test fluentd locally, simple way is installing fluentd or td-agent and simulate your traffic.

https://docs.fluentd.org/installation

For in_tail, we sometimes use dummer or writing specific script for complicated case, e.g. rotation. I'm not sure how your application generate logs, so if dummer is not fit, writing specific script is better.

mohitanchlia · 2019-11-13T18:01:41Z

And how do I verify the rate of ingestion to the output plugin? Send it to a file? Even then how do I verify? Are there any tools I can use or do I have to do the verification manually?

repeatedly · 2019-11-14T21:36:24Z

There are several ways: flowcounter/flowcounter_simple plugins, prometheus plugin, file/your calculated script and more.

penghuiy · 2019-12-20T01:40:24Z

I encountered the same situation on GKE, drive load to generate 50 MB file size very 1 minute, and the docker rotation size is 30mb. we found that under normal circumstances, in_tail detects log rotation and outputs two logs at the same time:

2019-12-19 16:46:07 +0000 [info]: [fluentd-containers.log] detected rotation of /var/log/containers/notification-549bf6d97f-z6lt2_production_notification-1d2adf6306e58363b61ae8b9e1a3de45fb7c1ecd8f377f39cdbd88ed25cd0e7d.log; waiting 5 seconds
2019-12-19 16:46:07 +0000 [info]: [fluentd-containers.log] following tail of /var/log/containers/notification-549bf6d97f-z6lt2_production_notification-1d2adf6306e58363b61ae8b9e1a3de45fb7c1ecd8f377f39cdbd88ed25cd0e7d.log

But in the time of log skip, there is only the first log, and in the following few seconds, these two logs appear again

2019-12-19 16:47:27 +0000 [info]: [fluentd-containers.log] detected rotation of /var/log/containers/notification-549bf6d97f-z6lt2_production_notification-1d2adf6306e58363b61ae8b9e1a3de45fb7c1ecd8f377f39cdbd88ed25cd0e7d.log; waiting 5 seconds

2019-12-19 16:47:39 +0000 [info]: [fluentd-containers.log] detected rotation of /var/log/containers/notification-549bf6d97f-z6lt2_production_notification-1d2adf6306e58363b61ae8b9e1a3de45fb7c1ecd8f377f39cdbd88ed25cd0e7d.log; waiting 5 seconds
2019-12-19 16:47:39 +0000 [info]: [fluentd-containers.log] following tail of /var/log/containers/notification-549bf6d97f-z6lt2_production_notification-1d2adf6306e58363b61ae8b9e1a3de45fb7c1ecd8f377f39cdbd88ed25cd0e7d.log

Does not seem to detect rotations at the right time.

version:
fluentd: 1.2.4

dengalebr · 2020-10-20T10:19:22Z

Is there any solution/workaround for this issue? we have stuck with same issue and multi workers not helping as well for us.

We use SCK (Splunk connect for kubernetes), but I don't think that changes anything.

takryo00 · 2020-11-01T03:13:23Z

I have same problem, and I think this problem is caused by k8s logging mechanism.

Containers write log to "/var/lib/docker/containers/*".
kubelet make a symbolic link to log source stored on "/var/lib/docker/containers/*" to "/var/log/pods/~~/" and "/var/log/containers/~.log"
fluentd configuration( we setup this ) and in_tail plugin is watching "/var/log/containers/~~.log".

Each symbolic links are made one by one of each pods.
This means in_tail plugin can not detect "source log rotation".

I think I have 2 ways to resolve this issue.(not tested.)

Setup in_tail plugin to detect logs directly.( set the path to "/var/lib/docker/containers/~~".)
If we take this way, we need to prepare some method to identify the logs we need to collect because there are all logs made by containers work on the node.
Write log to other files directly from application. ( not use stdout/stderr )

github-actions · 2021-03-01T10:44:06Z

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

github-actions · 2021-04-01T10:16:10Z

This issue was automatically closed because of stale in 30 days

ganmacs mentioned this issue Nov 5, 2019

Fluentd is not able to keep track of files in kubernetes fluent/fluentd#2683

Closed

takryo00 mentioned this issue Nov 1, 2020

Fluentd misses log file when >1 app log rotation happens back to back #465

Closed

github-actions bot added the stale label Mar 1, 2021

github-actions bot closed this as completed Apr 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fluentd is skipping logs in kubernetes #366

Fluentd is skipping logs in kubernetes #366

mohitanchlia commented Nov 5, 2019

ganmacs commented Nov 5, 2019

repeatedly commented Nov 5, 2019 •

edited

Loading

mohitanchlia commented Nov 5, 2019 •

edited

Loading

mohitanchlia commented Nov 12, 2019

repeatedly commented Nov 13, 2019

mohitanchlia commented Nov 13, 2019

repeatedly commented Nov 14, 2019

penghuiy commented Dec 20, 2019

dengalebr commented Oct 20, 2020

takryo00 commented Nov 1, 2020

github-actions bot commented Mar 1, 2021

github-actions bot commented Apr 1, 2021

Fluentd is skipping logs in kubernetes #366

Fluentd is skipping logs in kubernetes #366

Comments

mohitanchlia commented Nov 5, 2019

ganmacs commented Nov 5, 2019

repeatedly commented Nov 5, 2019 • edited Loading

mohitanchlia commented Nov 5, 2019 • edited Loading

mohitanchlia commented Nov 12, 2019

repeatedly commented Nov 13, 2019

mohitanchlia commented Nov 13, 2019

repeatedly commented Nov 14, 2019

penghuiy commented Dec 20, 2019

dengalebr commented Oct 20, 2020

takryo00 commented Nov 1, 2020

github-actions bot commented Mar 1, 2021

github-actions bot commented Apr 1, 2021

repeatedly commented Nov 5, 2019 •

edited

Loading

mohitanchlia commented Nov 5, 2019 •

edited

Loading