-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampling unseen / rear-seen trace base on message-digest and HyperLogLog #20268
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Something very similar was proposed before. I think it was even discussed during the SIG Collector meeting. Are you able to find that so that we can continue from what we picked up? |
@jpkrohling sure let me check the previous issue first. Sorry for posting duplicate things. |
@jpkrohling Had a quick glance over the collector meeting notes but still not sure about which proposal we are talking about. Would you mind mentioning some keywords or links? I'd like to follow it up and see what I can do to optimize the policy. |
I think this here would touch the same realm: #17874 |
I am going to close this one and raise another issue to address my new design with Count-Min Sketch, with some example. |
Component(s)
processor/tailsampling
Is your feature request related to a problem? Please describe.
There are a lot of polices in tail-based sampling processor.
polices list
I am looking for a policy that analysis if a request path (I prefer to describe it as trace path actually) is seen before, and decide to sample it if it's not seen or appeared for only a few times.
It's important to clearify that I am new to the community and may not fully understand the available policies. So maybe it's already implemented and I missed it. I will be very appreciated if you could point it out, and you won't need to read the idea I am going to express.
Describe the solution you'd like
My thoughts could be split into 3 parts:
How to define a unique trace path ?
Since every span has it's name (string), we could use a "-" to connect them. So for the following trace composed with 4 span:
we could describe it as:
Receive Request /api/v1/hello-Send RPC-Receive RPC-DB Query
However, for applications in real production environment, the trace might be complicated and has much more spans. It's very in-efficient to describe it with plaintext. Besides, we don't need to know the actually path. So we could use message-digest algorithm to make it shorter.
Take the trace with 4 spans above as example:
MD5('Receive Request /api/v1/hello') = 348a16bc22982ee37bb95e291cd13e7b
.MD5(previous + 'Send RPC') = MD5(348a16bc22982ee37bb95e291cd13e7bSend RPC' = 00e667b5d1954a2fda369de1247c96dc
.Keep adding new span name into it and than we could get the final result
279e7032fd9797d96572b989b0f2ed31
How could we know if a trace path is seen before?
We could just test those trace path identifier with an in-memory set. And put them into the set after making sampling decision.
You may consider it takes a lot of memory. There is another data structure called bloom filter which could response you with a correct result for an Unseen item and (maybe) false positive result for a seen item.
How could we calculate appearance count of a trace path?
The bloom filter helps identifying unseen trace path, but it's not enough. We still need to sample more and more rare-seen item. We need the counter for each item. So a
map[string]int{}
is still needed.What if we don't need the 100% accurate counter and use a data structure like HyperLogLog, which response an approximate result for each unique item (trace path).
Describe alternatives you've considered
No response
Additional context
Again I may not fully understand all available policies. If it's already implemented feel free to let me know. Many thanks in advance.
And if it's not available yet, I would like to hear more discussion to see if it's useful in production, and append/fix more detail of the simple idea.
The text was updated successfully, but these errors were encountered: