-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.11.1] Telegraf Not Flushing After flush_interval - Drops Metrics #914
Comments
this seems very odd, can you post your full config & OS? |
@s1m0 Something is not quite adding up here, it's not possible for this message:
to print when you have |
I noticed that in the code too. I increased the metric_buffer_limit to 2000 and got 1320 metrics written every 10 minutes.
Here's the config file with the original 1000 setting which I'm running with right now and we're back to 1000 metrics written every 10 minutes but I can't find the overwrite message in the logfile any more.
So there's 3.5 minutes of data (320 metrics) missing. The exec input that's generating that output is the last one in the config below, it's writing to our 3-node influxdb cluster (database = "FlexCache", user_agent = "OSS-FC")
/etc/default/telegraf:-
|
so you're saying that the 10m exec script is generating 1320 metrics, but only 1000 metrics are getting written? Is there not any follow-up write that writes 320 metrics? Or the problem is that your outputs are not flushing every 5s? (or both?) |
Both. Telegraf writes whatever is in the buffer after 10 minutes so if the buffer is limited to 1000 metrics, I lose 2.5 minutes of data. |
@s1m0 do you happen to have smaller configs where you see a similar problem? I haven't been able to reproduce your issue. Below I am running with a 2s collection interval, 5s flush interval, and a 20-metric buffer. I am seeing that the agent flushes every 5 seconds and whenever the buffer is full:
|
@s1m0 I believe I have reproduced your issue, I am getting some dropped metrics when processing over a few thousand per second. From what I can tell the issue has been fixed by @PierreF with this PR: #1087 I'll keep it open for now and close with a buffer performance refactor I'm working on in #1096 |
I have a long-running exec input that generates 132 metrics a minute, after about 7.5 minutes, it fills up the buffer but keeps on accumulating metrics until, after 10 minutes when the script exits, it issues a warning then writes the last 1000 metrics:-
I have the following settings in telegraf.conf:-
I was expecting the accumulated metrics to be flushed every flush_interval (5 seconds in this case) but even the flush_buffer_when_full isn't happening.
The text was updated successfully, but these errors were encountered: