Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate Limiting sampler for .NET #1996

Merged
merged 11 commits into from
Aug 16, 2024

Conversation

samsp-msft
Copy link
Contributor

Port of https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk-extensions/jaeger-remote-sampler/src/main/java/io/opentelemetry/sdk/extension/trace/jaeger/sampler/RateLimitingSampler.java to .NET

The rate limiting sampler is a sampler that will limit the number of traces to
the specified rate per second. It is typically used in conjunction with the ParentBasedSampler
to ensure that the rate limiting sampler is only applied to the root spans. If
a request comes in without a sampling decision, the rate limiting sampler will
make a decision based on the rate limit. The sampling decision is stored in the
activity context, via the Recorded property, and is passed along with calls to
other services called with HttpClient so the same decision is used for the entire
trace.

Example of RateLimitingSampler usage:

builder.Services.AddOpenTelemetry()
    .WithTracing(tracing =>
    {
        tracing.AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation()
            // Add the rate limiting sampler with a limit of 3 traces per second
            .SetSampler(new ParentBasedSampler(new RateLimitingSampler(3)))
    });

Changes

Please provide a brief description of the changes here.

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes
  • Changes in public API reviewed (if applicable)

@samsp-msft samsp-msft requested a review from a team August 6, 2024 16:39
@github-actions github-actions bot added the comp:extensions Things related to OpenTelemetry.Extensions label Aug 6, 2024
@samsp-msft
Copy link
Contributor Author

redo of #1967

Copy link

codecov bot commented Aug 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.57%. Comparing base (71655ce) to head (fccfe25).
Report is 386 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #1996       +/-   ##
===========================================
+ Coverage   73.91%   88.57%   +14.65%     
===========================================
  Files         267       10      -257     
  Lines        9615      175     -9440     
===========================================
- Hits         7107      155     -6952     
+ Misses       2508       20     -2488     
Flag Coverage Δ
unittests-Extensions 88.57% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...c/OpenTelemetry.Extensions/Internal/RateLimiter.cs 100.00% <100.00%> (ø)
...nTelemetry.Extensions/Trace/RateLimitingSampler.cs 100.00% <100.00%> (ø)

... and 268 files with indirect coverage changes

/// </returns>
public override SamplingResult ShouldSample(in SamplingParameters samplingParameters)
{
return this.rateLimiter.TrySpend(1.0) ? this.onSamplingResult : this.offSamplingResult;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow the algorithm.. Why pass 1.0 always here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying algorithm in the RateLimiter has an item cost for the relative cost of each item. We treat traces as all having the same cost, hence 1.0 (as the rate limiter takes doubles).

The algorithm is to essentially convert the cost into ticks (cost/max_samples_sec) and then see if there is a balance available to spend. If so the value is deducted and the decision is to keep that trace, if not it rejects it and the balance continues to grow.
The max_balance is to account for bursty traffic. If you have 3s of no traffic, the balance would build up, it gets capped at the allowance for 1s worth of items.

/// <param name="maxTracesPerSecond">The maximum number of traces that will be emitted each second.</param>
public RateLimitingSampler(int maxTracesPerSecond)
{
double maxBalance = maxTracesPerSecond < 1.0 ? 1.0 : maxTracesPerSecond;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxTracesPerSecond < 1.0 -- when would this happen ever?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the input arg is an int, it's possible someone could pass in a value < 1. It's not gonna work though, so defaulting to 1 makes sense here.

In this scenario, I think we should log a warning to indicate the passed in value has been overriden.

Copy link
Member

@MikeGoldsmith MikeGoldsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this is a direct copy of what's in Java and the rudimentary tests show it to work (on windows), but I don't understand the logic to know how this is meant to work.

Also, we'll need to figure out why the tests are failing on ubuntu.

/// <param name="maxTracesPerSecond">The maximum number of traces that will be emitted each second.</param>
public RateLimitingSampler(int maxTracesPerSecond)
{
double maxBalance = maxTracesPerSecond < 1.0 ? 1.0 : maxTracesPerSecond;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the input arg is an int, it's possible someone could pass in a value < 1. It's not gonna work though, so defaulting to 1 makes sense here.

In this scenario, I think we should log a warning to indicate the passed in value has been overriden.

first N will always be sampled in. Accounting for that makes the
results exactly match what was expected, but keeping a small fudge
factor just incase, because tests that rely on timing will always
fail at the most inopportune moments.
@github-actions github-actions bot requested a review from MikeGoldsmith August 7, 2024 21:03
@samsp-msft
Copy link
Contributor Author

I can see this is a direct copy of what's in Java and the rudimentary tests show it to work (on windows), but I don't understand the logic to know how this is meant to work.

Also, we'll need to figure out why the tests are failing on ubuntu.

I did some more experimentation, including running in WSL (Ubuntu). I had not accounted for the initial balance in the math in the test - with that, the first N asks will be sampled in as they chew through that balance. With this accounted for the expected value and actual are exact in my personal experiments, but I want to keep a small fudge factor as tests that rely on timing don't always work out the way you expect them to.

@samsp-msft
Copy link
Contributor Author

@cijothomas the ubuntu run should pass tests now.

Copy link
Member

@MikeGoldsmith MikeGoldsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for fixing linux tests 👍🏻

@samsp-msft
Copy link
Contributor Author

@cijothomas - poke 😸

@github-actions github-actions bot requested a review from MikeGoldsmith August 14, 2024 17:42
Copy link
Member

@vishweshbankwar vishweshbankwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - left some non-blocking comments.

@vishweshbankwar vishweshbankwar merged commit 7317b2a into open-telemetry:main Aug 16, 2024
61 checks passed
ezhang6811 pushed a commit to ezhang6811/opentelemetry-dotnet-contrib that referenced this pull request Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:extensions Things related to OpenTelemetry.Extensions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants