query: metric type and scrape interval aware deduplication #5094
Labels
component: query
dont-go-stale
Label for important issues which tells the stalebot not to close them
feature request/improvement
Is your proposal related to a problem?
The current deduplication algorithm is not perfect and there are still room to improve.
For example, we have issues like #981 and other deduplication issues related to counter metrics.
Also, the initial penalty value is set to 5000 by default https://github.com/thanos-io/thanos/blob/main/pkg/dedup/iter.go#L278. This value is reasonable, but we might be able to do better.
This is just a rough idea:
Prometheus has targets API and metric metadata API, which should be good enough for us to get the metrics type and the scrape interval of each scrape job.
Querier can maintain a cache and query Prometheus periodically to get this info. Ideally, this info can be set provided using some files as well (assuming metric type and scrape interval are hardly changed so users can provide a list of metrics). Then it can perform better deduplication based on the metric scrape interval and metrics type.
Describe the solution you'd like
(Describe your proposed solution here.)
Describe alternatives you've considered
(Write your answer here.)
The text was updated successfully, but these errors were encountered: