Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling unseen / rear-seen trace base on message-digest and HyperLogLog #20268

Closed
jiekun opened this issue Mar 22, 2023 · 6 comments
Closed
Labels
enhancement New feature or request processor/tailsampling Tail sampling processor

Comments

@jiekun
Copy link
Member

jiekun commented Mar 22, 2023

Component(s)

processor/tailsampling

Is your feature request related to a problem? Please describe.

There are a lot of polices in tail-based sampling processor.

polices list
  • always_sample: Sample all traces
  • latency: Sample based on the duration of the trace. The duration is determined by looking at the earliest start time and latest end time, without taking into consideration what happened in between.
  • numeric_attribute: Sample based on number attributes (resource and record)
  • probabilistic: Sample a percentage of traces. Read a comparison with the Probabilistic Sampling Processor.
  • status_code: Sample based upon the status code (OK, ERROR or UNSET)
  • string_attribute: Sample based on string attributes (resource and record) value matches, both exact and regex value matches are supported
  • trace_state: Sample based on TraceState value matches
  • rate_limiting: Sample based on rate
  • span_count: Sample based on the minimum and/or maximum number of spans, inclusive. If the sum of all spans in the trace is outside the range threshold, the trace will not be sampled.
  • boolean_attribute: Sample based on boolean attribute (resource and record).
  • and: Sample based on multiple policies, creates an AND policy
  • composite: Sample based on a combination of above samplers, with ordering and rate allocation per sampler. Rate allocation allocates certain percentages of spans per policy order. For example if we have set max_total_spans_per_second as 100 then we can set rate_allocation as follows
    • test-composite-policy-1 = 50 % of max_total_spans_per_second = 50 spans_per_second
    • test-composite-policy-2 = 25 % of max_total_spans_per_second = 25 spans_per_second
    • To ensure remaining capacity is filled use always_sample as one of the policies

I am looking for a policy that analysis if a request path (I prefer to describe it as trace path actually) is seen before, and decide to sample it if it's not seen or appeared for only a few times.

It's important to clearify that I am new to the community and may not fully understand the available policies. So maybe it's already implemented and I missed it. I will be very appreciated if you could point it out, and you won't need to read the idea I am going to express.

Describe the solution you'd like

My thoughts could be split into 3 parts:

  1. How to define a unique trace / trace path?
  2. How could we know if a trace path is seen before?
  3. How could we calculate appearance count of a trace path?

How to define a unique trace path ?

Since every span has it's name (string), we could use a "-" to connect them. So for the following trace composed with 4 span:

  • span1: Receive Request /api/v1/hello
  • span2: Send RPC
  • span3: Receive RPC
  • span4: DB Query

we could describe it as: Receive Request /api/v1/hello-Send RPC-Receive RPC-DB Query

However, for applications in real production environment, the trace might be complicated and has much more spans. It's very in-efficient to describe it with plaintext. Besides, we don't need to know the actually path. So we could use message-digest algorithm to make it shorter.

Take the trace with 4 spans above as example:

  1. for span1, we could calculate the MD5 value of it's span name: MD5('Receive Request /api/v1/hello') = 348a16bc22982ee37bb95e291cd13e7b.
  2. for span2, we receive the MD5 value of previous span and add span2 into it: MD5(previous + 'Send RPC') = MD5(348a16bc22982ee37bb95e291cd13e7bSend RPC' = 00e667b5d1954a2fda369de1247c96dc.

Keep adding new span name into it and than we could get the final result 279e7032fd9797d96572b989b0f2ed31

How could we know if a trace path is seen before?

We could just test those trace path identifier with an in-memory set. And put them into the set after making sampling decision.

You may consider it takes a lot of memory. There is another data structure called bloom filter which could response you with a correct result for an Unseen item and (maybe) false positive result for a seen item.

How could we calculate appearance count of a trace path?

The bloom filter helps identifying unseen trace path, but it's not enough. We still need to sample more and more rare-seen item. We need the counter for each item. So a map[string]int{} is still needed.

What if we don't need the 100% accurate counter and use a data structure like HyperLogLog, which response an approximate result for each unique item (trace path).

Describe alternatives you've considered

No response

Additional context

Again I may not fully understand all available policies. If it's already implemented feel free to let me know. Many thanks in advance.

And if it's not available yet, I would like to hear more discussion to see if it's useful in production, and append/fix more detail of the simple idea.

@jiekun jiekun added enhancement New feature or request needs triage New item requiring triage labels Mar 22, 2023
@github-actions github-actions bot added the processor/tailsampling Tail sampling processor label Mar 22, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@atoulme atoulme removed the needs triage New item requiring triage label Mar 23, 2023
@jpkrohling
Copy link
Member

Something very similar was proposed before. I think it was even discussed during the SIG Collector meeting. Are you able to find that so that we can continue from what we picked up?

@jiekun
Copy link
Member Author

jiekun commented Mar 28, 2023

Something very similar was proposed before. I think it was even discussed during the SIG Collector meeting. Are you able to find that so that we can continue from what we picked up?

@jpkrohling sure let me check the previous issue first. Sorry for posting duplicate things.

@jiekun
Copy link
Member Author

jiekun commented Mar 28, 2023

@jpkrohling Had a quick glance over the collector meeting notes but still not sure about which proposal we are talking about. Would you mind mentioning some keywords or links? I'd like to follow it up and see what I can do to optimize the policy.

@jpkrohling
Copy link
Member

I think this here would touch the same realm: #17874

@jiekun
Copy link
Member Author

jiekun commented May 5, 2023

I am going to close this one and raise another issue to address my new design with Count-Min Sketch, with some example.

@jiekun jiekun closed this as completed May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request processor/tailsampling Tail sampling processor
Projects
None yet
Development

No branches or pull requests

3 participants