Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor running_output buffering #1087

Closed
wants to merge 1 commit into from

Conversation

PierreF
Copy link
Contributor

@PierreF PierreF commented Apr 24, 2016

This PR refactor running_output buffering and try to make it more user-friendly

It change running_output to :

  • send buffered metrics in order
  • Has a buffer size of metric_buffer_limit (and not 100 times metric_buffer_limit - where 100 is hardcoded value)
  • Has consistent behavious regarding size of buffer whether flush_buffer_when_full is true or not.

It also fix bug, where:

  • only the very oldest batch get overwritten
  • unlimited metrics could be buffered

Only oldest batch overwritten

With default configuration of flush_buffer_when_full at true, when output is down, metrics get buffered in ro.tmpmetrics see here.

When ro.tmpmetrics is full, it starts overwritting value... but it only overwrite first slot: code

Order of metrics sent

If we hit the previous bug, running_output will ends with the following state:

ro.metrics : contains batch N of metrics
ro.tmpmetrics contains: N-1, 1, 2, 3, ..., 99

When output became available, ro.Write will take batch in this order:

ro.metrics, then ro.tmpmetrics, which result in :
N, N-1, 1, 2, ...

User may except that metrics are read in the following order : 1, 2, 3, ..., 99, N-1, N

Note that code in this PR may still be unordered in the following case:

  • output is not available, some metrics are waiting in ro.tmpmetrics
  • output is now available, flusher didn't yet called ro.Write
  • new metrics are added which cause ro.metrics to become full. Due to flush_buffer_when_full, ro.metrics are written (so latest metrics are sent)
  • flusher now call ro.Write, which unqueue metrics from ro.tmpmetrics (so older metrics are sent)

This should be pretty rare (output need to because available just before new added metrics cause ro.metrics to become full) and allowed simpler code in AddMetric.

buffer size

When flush_buffer_when_full was false, running_output actually keep up to metric_buffer_limit metrics.

But with flush_buffer_when_full to true (default config), it was ro.metrics which was limited to metric_buffer_limit code
When it was full, it get moved to rm.tmpmetrics which could hold up to 100 buffers.

So at the end, running_output could keep up to 100 * metric_buffer_limit metrics.

unlimited buffer

If the output get offline for some time, and let's say that 50 buffers are waiting in ro.tmpmetrics.
E.g. running_output has the following state:

ro.metrics : [metrics]
ro.tmpmetrics {0: [metrics], 1: [metrics], ..., 49: [metrics]]
ro.mapI = 50

Now output is back online and flusher write all value. running_output has the following state:

ro.metrics : empty or few metrics
ro.tmpmetrics : empty
ro.mapI = 50

If the output is once more offline for long enough, ro.AddMetric will continue to add in ro.tmpmetrics on
slot 50, 51, 52, etc (mapI continue to increase).
running_output will ends with the following state

ro.tmpmetrics = {50: [metrics], 51: [metrics], ..., 149:[metrics]}
len(ro.tmpmetrics) = 100
ro.mapI = 150

At this point, when overwritting value it will "overwrite" slot 0 which is unused. So instead of overwriting, it will add one next value to ro.tmpmetrics which now has a len() of 101.
If output keeps offline, ro.tmpmetrics will only ever grow.

@sparrc
Copy link
Contributor

sparrc commented Apr 25, 2016

thanks @PierreF, I've been pondering some refactors to running_output as well and I think this will fit in well. I would also like to eliminate the need to lock a mutex on every call to AddMetric but I can work on that after I've gotten this merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants