Sampling unseen / rear-seen trace base on message-digest and HyperLogLog #20268

jiekun · 2023-03-22T13:29:36Z

Component(s)

processor/tailsampling

Is your feature request related to a problem? Please describe.

There are a lot of polices in tail-based sampling processor.

polices list

always_sample: Sample all traces
latency: Sample based on the duration of the trace. The duration is determined by looking at the earliest start time and latest end time, without taking into consideration what happened in between.
numeric_attribute: Sample based on number attributes (resource and record)
probabilistic: Sample a percentage of traces. Read a comparison with the Probabilistic Sampling Processor.
status_code: Sample based upon the status code (OK, ERROR or UNSET)
string_attribute: Sample based on string attributes (resource and record) value matches, both exact and regex value matches are supported
trace_state: Sample based on TraceState value matches
rate_limiting: Sample based on rate
span_count: Sample based on the minimum and/or maximum number of spans, inclusive. If the sum of all spans in the trace is outside the range threshold, the trace will not be sampled.
boolean_attribute: Sample based on boolean attribute (resource and record).
and: Sample based on multiple policies, creates an AND policy
composite: Sample based on a combination of above samplers, with ordering and rate allocation per sampler. Rate allocation allocates certain percentages of spans per policy order. For example if we have set max_total_spans_per_second as 100 then we can set rate_allocation as follows
- test-composite-policy-1 = 50 % of max_total_spans_per_second = 50 spans_per_second
- test-composite-policy-2 = 25 % of max_total_spans_per_second = 25 spans_per_second
- To ensure remaining capacity is filled use always_sample as one of the policies

I am looking for a policy that analysis if a request path (I prefer to describe it as trace path actually) is seen before, and decide to sample it if it's not seen or appeared for only a few times.

It's important to clearify that I am new to the community and may not fully understand the available policies. So maybe it's already implemented and I missed it. I will be very appreciated if you could point it out, and you won't need to read the idea I am going to express.

Describe the solution you'd like

My thoughts could be split into 3 parts:

How to define a unique trace / trace path?
How could we know if a trace path is seen before?
How could we calculate appearance count of a trace path?

How to define a unique trace path ?

Since every span has it's name (string), we could use a "-" to connect them. So for the following trace composed with 4 span：

span1: Receive Request /api/v1/hello
span2: Send RPC
span3: Receive RPC
span4: DB Query

we could describe it as: Receive Request /api/v1/hello-Send RPC-Receive RPC-DB Query

However, for applications in real production environment, the trace might be complicated and has much more spans. It's very in-efficient to describe it with plaintext. Besides, we don't need to know the actually path. So we could use message-digest algorithm to make it shorter.

Take the trace with 4 spans above as example:

for span1, we could calculate the MD5 value of it's span name: MD5('Receive Request /api/v1/hello') = 348a16bc22982ee37bb95e291cd13e7b.
for span2, we receive the MD5 value of previous span and add span2 into it: MD5(previous + 'Send RPC') = MD5(348a16bc22982ee37bb95e291cd13e7bSend RPC' = 00e667b5d1954a2fda369de1247c96dc.

Keep adding new span name into it and than we could get the final result 279e7032fd9797d96572b989b0f2ed31

How could we know if a trace path is seen before?

We could just test those trace path identifier with an in-memory set. And put them into the set after making sampling decision.

You may consider it takes a lot of memory. There is another data structure called bloom filter which could response you with a correct result for an Unseen item and (maybe) false positive result for a seen item.

How could we calculate appearance count of a trace path?

The bloom filter helps identifying unseen trace path, but it's not enough. We still need to sample more and more rare-seen item. We need the counter for each item. So a map[string]int{} is still needed.

What if we don't need the 100% accurate counter and use a data structure like HyperLogLog, which response an approximate result for each unique item (trace path).

Describe alternatives you've considered

No response

Additional context

Again I may not fully understand all available policies. If it's already implemented feel free to let me know. Many thanks in advance.

And if it's not available yet, I would like to hear more discussion to see if it's useful in production, and append/fix more detail of the simple idea.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-03-22T13:30:02Z

Pinging code owners:

processor/tailsampling: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jpkrohling · 2023-03-27T19:27:07Z

Something very similar was proposed before. I think it was even discussed during the SIG Collector meeting. Are you able to find that so that we can continue from what we picked up?

jiekun · 2023-03-28T13:43:03Z

Something very similar was proposed before. I think it was even discussed during the SIG Collector meeting. Are you able to find that so that we can continue from what we picked up?

@jpkrohling sure let me check the previous issue first. Sorry for posting duplicate things.

jiekun · 2023-03-28T16:01:58Z

@jpkrohling Had a quick glance over the collector meeting notes but still not sure about which proposal we are talking about. Would you mind mentioning some keywords or links? I'd like to follow it up and see what I can do to optimize the policy.

jpkrohling · 2023-03-29T18:31:37Z

I think this here would touch the same realm: #17874

jiekun · 2023-05-05T16:53:27Z

I am going to close this one and raise another issue to address my new design with Count-Min Sketch, with some example.

jiekun added enhancement New feature or request needs triage New item requiring triage labels Mar 22, 2023

github-actions bot added the processor/tailsampling Tail sampling processor label Mar 22, 2023

atoulme removed the needs triage New item requiring triage label Mar 23, 2023

jiekun closed this as completed May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampling unseen / rear-seen trace base on message-digest and HyperLogLog #20268

Sampling unseen / rear-seen trace base on message-digest and HyperLogLog #20268

jiekun commented Mar 22, 2023

github-actions bot commented Mar 22, 2023

jpkrohling commented Mar 27, 2023

jiekun commented Mar 28, 2023

jiekun commented Mar 28, 2023

jpkrohling commented Mar 29, 2023

jiekun commented May 5, 2023