Refactor running_output buffering #1087
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR refactor running_output buffering and try to make it more user-friendly
It change running_output to :
It also fix bug, where:
Only oldest batch overwritten
With default configuration of flush_buffer_when_full at true, when output is down, metrics get buffered in ro.tmpmetrics see here.
When ro.tmpmetrics is full, it starts overwritting value... but it only overwrite first slot: code
Order of metrics sent
If we hit the previous bug, running_output will ends with the following state:
ro.metrics : contains batch N of metrics
ro.tmpmetrics contains: N-1, 1, 2, 3, ..., 99
When output became available, ro.Write will take batch in this order:
ro.metrics, then ro.tmpmetrics, which result in :
N, N-1, 1, 2, ...
User may except that metrics are read in the following order : 1, 2, 3, ..., 99, N-1, N
Note that code in this PR may still be unordered in the following case:
This should be pretty rare (output need to because available just before new added metrics cause ro.metrics to become full) and allowed simpler code in AddMetric.
buffer size
When flush_buffer_when_full was false, running_output actually keep up to metric_buffer_limit metrics.
But with flush_buffer_when_full to true (default config), it was ro.metrics which was limited to metric_buffer_limit code
When it was full, it get moved to rm.tmpmetrics which could hold up to 100 buffers.
So at the end, running_output could keep up to 100 * metric_buffer_limit metrics.
unlimited buffer
If the output get offline for some time, and let's say that 50 buffers are waiting in ro.tmpmetrics.
E.g. running_output has the following state:
ro.metrics : [metrics]
ro.tmpmetrics {0: [metrics], 1: [metrics], ..., 49: [metrics]]
ro.mapI = 50
Now output is back online and flusher write all value. running_output has the following state:
ro.metrics : empty or few metrics
ro.tmpmetrics : empty
ro.mapI = 50
If the output is once more offline for long enough, ro.AddMetric will continue to add in ro.tmpmetrics on
slot 50, 51, 52, etc (mapI continue to increase).
running_output will ends with the following state
ro.tmpmetrics = {50: [metrics], 51: [metrics], ..., 149:[metrics]}
len(ro.tmpmetrics) = 100
ro.mapI = 150
At this point, when overwritting value it will "overwrite" slot 0 which is unused. So instead of overwriting, it will add one next value to ro.tmpmetrics which now has a len() of 101.
If output keeps offline, ro.tmpmetrics will only ever grow.