Skip to content
This repository has been archived by the owner on Aug 30, 2019. It is now read-only.

[dual sampling] using both score and priority sampler for priority 0 traces #345

Merged
merged 6 commits into from
Jan 4, 2018

Conversation

ufoot
Copy link

@ufoot ufoot commented Dec 27, 2017

In the case of priority 0 traces, send traces to both samplers. This gives
a second chance to priority 0 traces to be picked up by the score sampler.
Also adds stats for -1 sampling priority traces.

…traces

In the case of priority 0 traces, send traces to both samplers. This gives
a second chance to priority 0 traces to be picked up by the score sampler.
Also adds stats for -1 sampling priority traces.
@ufoot ufoot force-pushed the christian/dualsampling branch from 26bb8b1 to c07dd27 Compare December 27, 2017 10:45
@ufoot ufoot force-pushed the christian/dualsampling branch from 4f87c0f to 48a5831 Compare December 27, 2017 13:13
@ufoot ufoot changed the title [dual sampling] (WIP) using both score and priority sampler for priority 0 traces [dual sampling] using both score and priority sampler for priority 0 traces Dec 27, 2017
@@ -324,8 +326,11 @@ APM_CONF:
if v, e := conf.GetFloat("trace.sampler", "max_traces_per_second"); e == nil {
c.MaxTPS = v
}
if v := strings.ToLower(conf.GetDefault("trace.sampler", "priority_sampling", "")); v == "yes" || v == "true" {
c.PrioritySampling = true
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably never worked, the default was true, and we were overriding it to true...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless after all ;)

@ufoot ufoot requested review from talwai and AlexJF December 27, 2017 14:10
agent/agent.go Outdated
} else {
samplers = []*Sampler{a.ScoreEngine}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following looks cleaner IMO (everything related to priority grouped together and ok -> hasPriority):

samplers := make([]*Sampler, 0, 2)
priority, hasPriority := root.Metrics[samplingPriorityKey]
if hasPriority && a.PriorityEngine != nil {
  samplers = append(samplers, a.PriorityEngine)

  if a.conf.ScorePriority0Traces && priority == 0 {
    samplers = append(samplers, a.ScoreEngine)
  }
} else {
  samplers = append(samplers, a.ScoreEngine)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that to be a 100% match, the a.PriorityEngine should go to a separate if

samplers := make([]*Sampler, 0, 2)
priority, hasPriority := root.Metrics[samplingPriorityKey]
if hasPriority {
  if a.PriorityEngine != nil {
    samplers = append(samplers, a.PriorityEngine)
  }

  if a.conf.ScorePriority0Traces && priority == 0 {
    samplers = append(samplers, a.ScoreEngine)
  }
} else {
  samplers = append(samplers, a.ScoreEngine)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm actually none of them are equivalent to yours now that I look closer :/ Maybe there's no way around it :P

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also spent some time on it initially, the "problem" is that in one case we do an if ... if ... and in the other one an if ... else .... Thanks for the thorough review.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested in the config.go comment, let's just have both mechanisms always on, it should greatly simplify the code here.

@@ -324,8 +326,11 @@ APM_CONF:
if v, e := conf.GetFloat("trace.sampler", "max_traces_per_second"); e == nil {
c.MaxTPS = v
}
if v := strings.ToLower(conf.GetDefault("trace.sampler", "priority_sampling", "")); v == "yes" || v == "true" {
c.PrioritySampling = true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

agent/agent.go Outdated
if a.conf.ScorePriority0Traces {
// Send traces to possibly several score engines.
if ok && a.PriorityEngine != nil {
// If Priority is defined, send to priority sampling, regardless of priority value.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the count of "priority < 0" traces affect the applied sample rate in the client? Some apps will routinely generate traces which the user will want to ignore via the priority flag e.g. healthchecks, debug endpoints, whatever. Should the frequency of such traces still influence the sample rate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Indeed, since the client lib can only influence the 0/-1 choice, it totally makes sense it takes decisions only on those ones. But then, I'm thinking we could also exclude the +2 traces. The problem is quite symmetrical here. In any case, I think this is really sampling logic, so should probably go to sampler package, easier to unit-test and bullet-proof rather than in this "high-level" code.

Also, I'm worried about how we can explain exactly what that -1 does. Because, it can sound like it's "exclude this from tracing". When, it's really not. The stats are still computed for instance. That's the whole point of this -1 thing I think, make sure there's no trace sampled, but still trace it. I'm saying this because we had similar issues whit the "disabled tracer" mode, which did not really disabled the tracer, but only the sending of traces to the agent.

I'm going to exclude -1 and +2 flagged traces from the count anyway.

sampled := root.Metrics[samplingPriorityKey] > 0
sampled := samplingPriority > 0

if samplingPriority < 0 || samplingPriority > 1 {
Copy link
Author

@ufoot ufoot Dec 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@talwai this is where I'd exclude the traces which should have no impact on sampling ratio. I had a second thought about it and really do think the priority 2 traces should be excluded too. Not doing so would a) break the ratio for other traces (keeping less of them, when this is really about "decide to have some more") and b) give a false illusion that having 1 and 2 in the same bag does provide some fail-safe rate limiting. It does not. On a service sending 100 traces per second, with a max TPS of 10, the default target ratio would be 0.1. If we select 50% of them with priority set to 2, then we have 50 traces per sec no matter what max TPS is set to. So if we actually include priority 2 traces in the count, it will result in having only the one we selected for that service, but also possibly impact other services, and in any case, there's no capping to 10 TPS. The big picture is: priority 2 traces are something on top of the regular sampling performed by the library, it's not included in it.

config/agent.go Outdated
PreSampleRate: 1.0,
MaxTPS: 10,
PrioritySampling: true,
ScorePriority0Traces: true,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the goal is to simplify and have both samplers, I'd suggest to just drop both PrioritySampling and ScorePriority0Traces while always having both logics working. I see no case where these configuration should be used nor exposed to the user.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

priorityPtr := &ts.TracesPriorityNone
if a.PriorityEngine != nil {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check for it now, it's always defined since we removed the PrioritySampling option

Copy link

@LotharSee LotharSee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, the removal of the extra options is great and makes the code much simpler.

Combined with the clear explanation of each priority values, it was easy to review. Let's be sure now to update https://github.com/DataDog/datadog-trace-agent/wiki/Distributed-Priority-Sampling

Looking good to me.

@ufoot ufoot added this to the 5.20.1 milestone Jan 4, 2018
@ufoot ufoot merged commit 1c07282 into master Jan 4, 2018
@dtilghman dtilghman deleted the christian/dualsampling branch January 5, 2018 05:17
@palazzem palazzem modified the milestone: 5.21.1 Jan 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants