Refactor running_output buffering #1087

PierreF · 2016-04-24T11:45:33Z

This PR refactor running_output buffering and try to make it more user-friendly

It change running_output to :

send buffered metrics in order
Has a buffer size of metric_buffer_limit (and not 100 times metric_buffer_limit - where 100 is hardcoded value)
Has consistent behavious regarding size of buffer whether flush_buffer_when_full is true or not.

It also fix bug, where:

only the very oldest batch get overwritten
unlimited metrics could be buffered

Only oldest batch overwritten

With default configuration of flush_buffer_when_full at true, when output is down, metrics get buffered in ro.tmpmetrics see here.

When ro.tmpmetrics is full, it starts overwritting value... but it only overwrite first slot: code

Order of metrics sent

If we hit the previous bug, running_output will ends with the following state:

ro.metrics : contains batch N of metrics
ro.tmpmetrics contains: N-1, 1, 2, 3, ..., 99

When output became available, ro.Write will take batch in this order:

ro.metrics, then ro.tmpmetrics, which result in :
N, N-1, 1, 2, ...

User may except that metrics are read in the following order : 1, 2, 3, ..., 99, N-1, N

Note that code in this PR may still be unordered in the following case:

output is not available, some metrics are waiting in ro.tmpmetrics
output is now available, flusher didn't yet called ro.Write
new metrics are added which cause ro.metrics to become full. Due to flush_buffer_when_full, ro.metrics are written (so latest metrics are sent)
flusher now call ro.Write, which unqueue metrics from ro.tmpmetrics (so older metrics are sent)

This should be pretty rare (output need to because available just before new added metrics cause ro.metrics to become full) and allowed simpler code in AddMetric.

buffer size

When flush_buffer_when_full was false, running_output actually keep up to metric_buffer_limit metrics.

But with flush_buffer_when_full to true (default config), it was ro.metrics which was limited to metric_buffer_limit code
When it was full, it get moved to rm.tmpmetrics which could hold up to 100 buffers.

So at the end, running_output could keep up to 100 * metric_buffer_limit metrics.

unlimited buffer

If the output get offline for some time, and let's say that 50 buffers are waiting in ro.tmpmetrics.
E.g. running_output has the following state:

ro.metrics : [metrics]
ro.tmpmetrics {0: [metrics], 1: [metrics], ..., 49: [metrics]]
ro.mapI = 50

Now output is back online and flusher write all value. running_output has the following state:

ro.metrics : empty or few metrics
ro.tmpmetrics : empty
ro.mapI = 50

If the output is once more offline for long enough, ro.AddMetric will continue to add in ro.tmpmetrics on
slot 50, 51, 52, etc (mapI continue to increase).
running_output will ends with the following state

ro.tmpmetrics = {50: [metrics], 51: [metrics], ..., 149:[metrics]}
len(ro.tmpmetrics) = 100
ro.mapI = 150

At this point, when overwritting value it will "overwrite" slot 0 which is unused. So instead of overwriting, it will add one next value to ro.tmpmetrics which now has a len() of 101.
If output keeps offline, ro.tmpmetrics will only ever grow.

sparrc · 2016-04-25T21:05:10Z

thanks @PierreF, I've been pondering some refactors to running_output as well and I think this will fit in well. I would also like to eliminate the need to lock a mutex on every call to AddMetric but I can work on that after I've gotten this merged

Refactor running_output buffering

91cbcb6

sparrc closed this in c159460 Apr 25, 2016

sparrc mentioned this pull request Apr 26, 2016

[0.11.1] Telegraf Not Flushing After flush_interval - Drops Metrics #914

Closed

PierreF deleted the running_output branch August 4, 2018 13:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor running_output buffering #1087

Refactor running_output buffering #1087

PierreF commented Apr 24, 2016

sparrc commented Apr 25, 2016

Refactor running_output buffering #1087

Refactor running_output buffering #1087

Conversation

PierreF commented Apr 24, 2016

Only oldest batch overwritten

Order of metrics sent

buffer size

unlimited buffer

sparrc commented Apr 25, 2016