Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(approx_topk): Map approx_topk operation in all cases #16131

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

benclive
Copy link
Contributor

@benclive benclive commented Feb 6, 2025

What this PR does / why we need it:
Calling approx_topk on some types of queries was causing a panic in the querier.

  • This is because the approx_topk operation is not supported on the querier and can only be executed on the front-end and MUST be rewritten.
  • In some operations, the approx_topk was not considered shardable and we did not rewrite this query. Specifically, comparison operators are not considered shardable and so skip the rewriting & cause the queriers to panic.

This PR does 2 things:
1. Adds comparison operators to "shardableOps" map. This means they will be considered shardable and trigger the sharding logic. This is the same as the existing arithmetic operators - we don't have any special logic for them, though it doesn't really give us any benefit either. Removed because it not clear this is still correct.
2. Adds a catch-all for the approx_topk operation. If the inner query is NOT shardable, we still rewrite the logic as if there are 0 shards and send the inner query to the querier. This won't panic but it shouldn't be happening a lot so I added an error log.

I added tests for these cases & did some local testing to check they behave as intended.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@benclive benclive requested a review from a team as a code owner February 6, 2025 12:22
pkg/logql/shardmapper.go Outdated Show resolved Hide resolved
Comment on lines 2295 to 2299
// binops - comparison
OpTypeGT: true,
OpTypeGTE: true,
OpTypeLT: true,
OpTypeLTE: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure these are shardable? Is this an orthogonal change to the above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, we should treat them the same way as the arithmetic operations above. There's no special logic for those operations, they just get mapped to queriers and combined as usual.
I don't think sharding a approx_topk query containing a comparison will change the result, as far as I can see, though I don't have any proof beyond the existing tests that it is valid to do so.
If you prefer I can remove this change and those types of queries will fall into the "not shardable" logic instead. I've tested it locally and it does seem to give correct results - WDYT?

Copy link
Contributor

@jeschkies jeschkies Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, so I'd push the operator don to each shard because it's filtering out the metrics. Does that makes sense? As in:

count_over_timer({...}[...]) > 0

should be

downstream<count_over_timer({..., __shard__=0}[...]) > 0>
++ ...
++ downstream<count_over_timer({..., __shard__=n}[...]) > 0>

and not

(downstream<count_over_timer({..., __shard__=0}[...])> 
 ++ ...
 ++ downstream<count_over_timer({..., __shard__=n}[...])>
) > 0

I think sharding these is a little more evolved and should be a separarte PR.

@benclive benclive requested a review from jeschkies February 7, 2025 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants