-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help debugging a td-agent problem #1384
Comments
I want to make the problem clear:
|
We want to know all error logs.
This error doesn't include error cause. I assume your logs have more error messages / stacktraces. |
@repeatedly - What's above is all we get from |
Hmm... could you upload your |
I noticed that I had missed a line in my copy/paste of the error, I fixed it above by adding this line to the top:
The full log file is included, but I removed IP address information by running |
I missed that you're configuring Of course, it uses memory, upto (and it's not any bug or fluentd.) |
@tagomoris - Thanks for the tip. I dug into it. Charts below are in EDT. It looks like the output throughput dropped when it was still in a healthy memory-level at about 14:25 UTC. This lead to it beginning to hold onto memory until it errored out at 15:27 UTC in the logs, shown below: Could the out of memory be a red herring? Perhaps the real issue is what caused output to suddenly stop being written. |
@tagomoris - can we reopen this ticket, my above post explains how running out of memory was a symptom not a cause. |
I want to know the chronology of problem. |
I don't know why the output dropped, although I have been able to identify another node which is showing the same exact results! This time I've kept it alive to poke around a bit. Is there any way to poke around the s3 plugin to determine why it's not outputting? edit I've attached the sigdump logs, but they don't seem very useful |
@repeatedly @tagomoris - can we reopen this ticket until a resolution is found? |
@donato Yes. I checked your sigdump result but does this contain actual child process result? |
@repeatedly - I followed the directions here . Is there an alternative way to get better logs? I've started thinking this could be an issue with logrotate compressing files, so I'm going to tinker with the "delaycompress" option. Have you seen this as a problem before? |
@repeatedly @tagomoris - can we reopen this ticket until a resolution is found? |
reopen is okay. I replied to you "yes" at the previous comment :) |
Hey all,
fluentd: 0.12.29
fluent-plugin-s3: 0.7.1
td: 0.10.29
OS: Ubuntu 14.04.5 LTS
The problem is that td-agent stopped outputting its logs. When this happened it began growing in size, over 3 days it was never able to flush them out. We tried to manually force it to, by sending it SIGUSR1 signals to both the child and parent process, but this did not work. Ultimately we had to kill the process and have data loss, in order to prevent harm to our other production systems.
Logs
The text was updated successfully, but these errors were encountered: