Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client side aggregation #134

Merged
merged 13 commits into from
Oct 15, 2020
Merged

Client side aggregation #134

merged 13 commits into from
Oct 15, 2020

Conversation

ogaca-dd
Copy link
Contributor

Add the client side aggregation to improve the performance:

  • For Count: sum the value for same metric names and tags
  • For Gauge: Keep the last value for same metric names and tags
  • For Set: Keep unique value for for same metric names and tags

Copy link
Member

@truthbk truthbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I have a question though, I believe this approach requires incoming metrics to flush (ie. we evaluate if we should flush when we aggregate and check the time elapsed), if the in-flow of metrics stops we would not flush what had been aggregated until that point. I believe with both the go and java client we have a timer task that flushes after a certain period of time. Otherwise, just some minor questions. Nice! 👌

{
if (force
|| _stopWatch.ElapsedMilliseconds > _flushIntervalMilliseconds
|| _values.Count >= _maxUniqueStatsBeforeFlush)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? I presume the goal here is to curb memory usage. We didn't implement anything like this in the java client, but maybe we should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very easy to implement and limit the memory usage so I think it worth implementing it.

src/StatsdClient/Aggregator/MetricAggregatorParameters.cs Outdated Show resolved Hide resolved
src/StatsdClient/Aggregator/MetricAggregatorParameters.cs Outdated Show resolved Hide resolved
// This code was auto generated
public override int GetHashCode()
{
int hashCode = -335110880;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it makes sense here, I have to review the rest of the PR (😊), but I'm wondering if it makes sense to cache the hashCode the way we've done in the java dogstatsd client in case this is called multiple times, to avoid multiple hashing operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should be call once by object. I ran a profiler and this function appeared in the hot path and so I keep an eye on it!

public SetAggregator(MetricAggregatorParameters parameters, Telemetry optionalTelemetry)
{
_aggregator = new AggregatorFlusher<StatsMetricSet>(parameters, MetricType.Set);
_pool = new Pool<StatsMetricSet>(pool => new StatsMetricSet(pool), 2 * parameters.MaxUniqueStatsBeforeFlush);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pre-allocate sets and not the other metric types (this is a rarer metric when compared to counters and gauges)? That said, it's a good performance decision, but I'd like to understand a bit more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CountAggregator does not allocate memory as it sum integers.
SetAggregator allocates a HashSet<string> by metric. I tried to avoid memory allocations. As I already have a Pool it does not introduce too much complexity. What do you think?

src/StatsdClient/ClientSideAggregationConfig.cs Outdated Show resolved Hide resolved
@ogaca-dd
Copy link
Contributor Author

ogaca-dd commented Oct 6, 2020

Looks great. I have a question though, I believe this approach requires incoming metrics to flush (ie. we evaluate if we should flush when we aggregate and check the time elapsed), if the in-flow of metrics stops we would not flush what had been aggregated until that point. I believe with both the go and java client we have a timer task that flushes after a certain period of time. Otherwise, just some minor questions. Nice! 👌

AggregatorFlusher<T>.TryFlush is called by StatsRouter.OnIdle which is called by StatsBufferize.OnIdle. AsynchronousWorker.Dequeue called StatsBufferize.OnIdle when there is no metric to handle. The aggregated metrics should be flushed at regular interval.

Base automatically changed from olivierg/prepare-client-side-aggregation to master October 8, 2020 16:24
@ogaca-dd ogaca-dd merged commit acf5f4e into master Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants