Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluentd is not able to keep track of files in kubernetes #2683

Closed
mohitanchlia opened this issue Nov 4, 2019 · 1 comment
Closed

Fluentd is not able to keep track of files in kubernetes #2683

mohitanchlia opened this issue Nov 4, 2019 · 1 comment

Comments

@mohitanchlia
Copy link

Describe the bug
We run multiple kubernetes pods on the same node. Under low traffic there is generally not a problem but under high load when we generate 128MB size of logs file every 2 minutes fluentd starts to skip the logs. There are gaps in log ingestion. I tracked the last event and the first event in the logs and they correspond to the last log line of the file and the first log line of the next log file that it picks up, all other log files in between gets skipped. We have tons of CPU power available, ruby process is only at 50%.

We've tried almost all the settings described in other related thread, like watcher settings, rotate wait settings but nothing seems to help.

To Reproduce

     <source>
        @id containers.log
        @type tail
        @label @SPLUNK
        tag tail.containers.*
        path /var/log/containers/*.log
        refresh_interval 5s
        pos_file /var/log/splunk-fluentd-containers.log.pos
        path_key source
        read_from_head true
        rotate_wait 10s
        <parse>
          @type json
          time_key time
          time_type string
          time_format %Y-%m-%dT%H:%M:%S.%NZ
          localtime false
        </parse>
      </source>

Have 2 similar pods on a 8 core node, drive load to generate 128 MB file size very 2 minutes. At this time you'll see POD A logs are ingested but no logs from POD B, and then after few mts POD B logs are ingested but no logs from POD A, and this behavior continues.

***** Only thing that worked is if I hard code individual containers in multiple tail input sections using multiple workers. This solution is not a viable solution because container ids are not known ahead of time.

Expected behavior
All the logs are forwarded without skipping them
Your Environment
fluentd 1.4
Amazon Linux 2

  • Fluentd or td-agent version: fluentd --version or td-agent --version
  • Operating system: cat /etc/os-release
  • Kernel version: uname -r

If you hit the problem with older fluentd version, try latest version first.

Your Configuration

<!-- Write your configuration here -->

Your Error Log
No errors are noted. No warnings as well

<!-- Write your **ALL** error log here -->

Additional context

@mohitanchlia mohitanchlia added the bug Something isn't working label Nov 4, 2019
@ganmacs
Copy link
Member

ganmacs commented Nov 5, 2019

@ganmacs ganmacs closed this as completed Nov 5, 2019
@ganmacs ganmacs removed the bug Something isn't working label Nov 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants