-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluentd is skipping logs in kubernetes #366
Comments
this one? fluent/fluentd#2236 (comment) |
I want to know this problem is k8s specific or not. |
@repeatedly I can try and reproduce, however the fact that multiple tail sources with multiple workers work just fine leads me to believe fluentd is not able to keep up. However, ruby CPU is only 50%. It looks like with multiple pods it is time slicing the reads instead of using multiple threads to read every log file? |
Can I get some guidance on how to reproduce this in a standalone environment? We use splunk to visualize so it's easy to see when logs are skipped or not forwarded. We are pumping > 12K log lines per node per fluentd instance. |
To test fluentd locally, simple way is installing fluentd or td-agent and simulate your traffic. https://docs.fluentd.org/installation For in_tail, we sometimes use dummer or writing specific script for complicated case, e.g. rotation. I'm not sure how your application generate logs, so if dummer is not fit, writing specific script is better. |
And how do I verify the rate of ingestion to the output plugin? Send it to a file? Even then how do I verify? Are there any tools I can use or do I have to do the verification manually? |
There are several ways: flowcounter/flowcounter_simple plugins, prometheus plugin, file/your calculated script and more. |
I encountered the same situation on GKE, drive load to generate 50 MB file size very 1 minute, and the docker rotation size is 30mb. we found that under normal circumstances, in_tail detects log rotation and outputs two logs at the same time:
But in the time of log skip, there is only the first log, and in the following few seconds, these two logs appear again
Does not seem to detect rotations at the right time. version: |
Is there any solution/workaround for this issue? we have stuck with same issue and multi workers not helping as well for us. We use SCK (Splunk connect for kubernetes), but I don't think that changes anything. |
I have same problem, and I think this problem is caused by k8s logging mechanism.
Each symbolic links are made one by one of each pods. I think I have 2 ways to resolve this issue.(not tested.)
|
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days |
This issue was automatically closed because of stale in 30 days |
Describe the bug
We run multiple kubernetes pods on the same node. Under low traffic there is generally not a problem but under high load when we generate 128MB size of logs file every 2 minutes fluentd starts to skip the logs. There are gaps in log ingestion. I tracked the last event and the first event in the logs and they correspond to the last log line of the file and the first log line of the next log file that it picks up, all other log files in between gets skipped. We have tons of CPU power available, ruby process is only at 50%.
We've tried almost all the settings described in other related thread, like watcher settings, rotate wait settings but nothing seems to help.
To Reproduce
Have 2 similar pods on a 8 core node, drive load to generate 128 MB file size very 2 minutes. At this time you'll see POD A logs are ingested but no logs from POD B, and then after few mts POD B logs are ingested but no logs from POD A, and this behavior continues.
***** Only thing that worked is if I hard code individual containers in multiple tail input sections using multiple workers. This solution is not a viable solution because container ids are not known ahead of time.
Expected behavior
All the logs are forwarded without skipping them
Your Environment
fluentd 1.4
Amazon Linux 2
Fluentd or td-agent version: fluentd --version or td-agent --version
Operating system: cat /etc/os-release
Kernel version: uname -r
If you hit the problem with older fluentd version, try latest version first.
Your Configuration
Your Error Log
No errors are noted. No warnings as well
The text was updated successfully, but these errors were encountered: