Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TraceQL Metrics] Add local disk caching for generator completed blocks #3799

Merged
merged 6 commits into from
Jun 21, 2024

Conversation

mdisibio
Copy link
Contributor

@mdisibio mdisibio commented Jun 20, 2024

What this PR does:
Generators service metrics queries against recent data, which is done frequently and sensitive to user-facing performance. This adds a cache of query responses for local completed blocks which is typically the last 5-20 minutes.

The generators already have a similar cache for the metrics summary api, but this takes a different approach. Whereas the summary api cache is in-memory, this is disk-based and writes new caching files to the block folder. These files are unique to each request (query + params), and exist for the lifetime of the block. They aren't flushed to object storage, and they get automatically cleaned up when the block is deleted. (Note - this has precedent in the flushed file that is written to block folders to track their flushed status) Reasons to avoid the in-memory cache is that generators are already usually memory-intensive, and the cache is often inadequate as-is. Example block contents:

pwd = /tempo/generator/traces/single-tenant/blocks/single-tenant/
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/bloom-0
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/cache_query_range_14197044629227963207.buf
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/cache_query_range_18089096181261108941.buf
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/cache_query_range_433499706678913880.buf
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/cache_query_range_4989462974766826143.buf
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/data.parquet
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/flushed
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/index
  65beeff3-d9c5-49c1-b69e-8c7b022024f3/meta.json

There are additional changes in this PR which proved necessary:

  • Generator RF1 block meta times weren't right. They always flushed traces with the timestamp of "now" so the metas were off by roughly trace_idle_period + trace_flush_period. This fixes it to flush proper times.
  • As part of that, it also requires that the generator traces WAL has a real ingestion_slack time. This fixes the default to match ingesters of 2 minutes. NOTE - This is different than the slack time for metrics.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@mdisibio mdisibio enabled auto-merge (squash) June 21, 2024 14:57
@mdisibio mdisibio merged commit b29d56c into grafana:main Jun 21, 2024
14 checks passed
mapno pushed a commit that referenced this pull request Jun 24, 2024
…ks (#3799)

* working version

* Fix start/end meta of generator-flushed blocks, and config default. Cleanup/dedupe timerange logic.

* Add tests

* lint

* changelog

* review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants