-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allows by/without to be empty and available for max/min_over_time #3030
Conversation
…rations. This essentially allows to aggregate over all dimensions when using by () while without() is a noop. This also makes it possible for max and min range vector aggregation to use grouping, which is simpler than doing max by (foo) max_over_time(...) Examples: - `max_over_time(...) by ()` gives your the max over time accross of all series. - `min_over_time(...) by (namespace)` gives you the min over time per namespace. - `max_over_time(...) without (namespace)` gives you the max over time removing the namespace dimension. PS: I've also refactored a bit how we optimized grouping to make it more clear. Fixes grafana#2884 Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question then LGTM
@@ -431,7 +431,7 @@ Supported function for operating over unwrapped ranges are: | |||
- `stddev_over_time(unwrapped-range)`: the population standard deviation of the values in the specified interval. | |||
- `quantile_over_time(scalar,unwrapped-range)`: the φ-quantile (0 ≤ φ ≤ 1) of the values in the specified interval. | |||
|
|||
Except for `sum_over_time`, `min_over_time` and `max_over_time` unwrapped range aggregations support grouping. | |||
Except for `sum_over_time`, unwrapped range aggregations support grouping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we don't allow this on sum_over_time
? I think sum_over_time(x) by (y)
could be more effectively done than sum by (y) (sum_over_time(x))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I have the same question, I'm trying to visualize Netflow statistics using Loki, I have the data coming in like this:
My query looks like this to calculate the rate on a line chart: sum_over_time({job="netflow", SrcAddr=~"$srcaddr", DstAddr=~"$dstaddr"} | json | unwrap Bytes[5m])
. This works, but I'd like to be able to group by SrcAddr,DstAddr
, or maybe some other combo like DstAddr,DstPort
. I don't see a way to do this while calculating the rate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add, because it appears there is a limit to 500 series per query, and Netflow can generate a very large number of unique series, the ability to use group by
with sum_over_time
is needed to get the series number down to a reasonable level when calculating the rate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I discussed with @slim-bean , the reasoning behind is that for sum, count and rate you can sum by the whole range vector which would produce the same result. This is not true for other operation since they are not associate when you reduce down labels before.
Since this way of doing ..._over_time() by/without() is specific to Loki and doesn't exist in Prometheus we wanted to have it only where there's no escape.
@loganmc10 You can do this:
sum by (SrcAddr,DstAddr) (
sum_over_time({job="netflow", SrcAddr=~"$srcaddr", DstAddr=~"$dstaddr"} | json | unwrap Bytes[5m])
)
This is the same as if
sum_over_time({job="netflow", SrcAddr=~"$srcaddr", DstAddr=~"$dstaddr"} | json | unwrap Bytes[5m]) by (SrcAddr,DstAddr)
Was existing, except that this is more Prometheus like, and if we can keep it Prometheus like, I think we prefer to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loganmc10 Also check out the max_query_series
limit, which is configurable per deployment and per tenant:
# Limit the maximum of unique series that is returned by a metric query.
# When the limit is reached an error is returned.
# CLI flag: -querier.max-query-series
[max_query_series: <int> | default = 500]
This essentially allows to aggregate over all dimensions when using by () while without() is a noop.
This also makes it possible for max and min range vector aggregation to use grouping, which is simpler than doing max by (foo) max_over_time(...)
Examples:
max_over_time(...) by ()
gives your the max over time accross of all series.min_over_time(...) by (namespace)
gives you the min over time per namespace.max_over_time(...) without (namespace)
gives you the max over time removing the namespace dimension.PS: I've also refactored a bit how we optimized grouping to make it more clear.
Fixes #2884
Signed-off-by: Cyril Tovena [email protected]