-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query-frontend: enforce limitsMiddleware for remote read requests #8374
Query-frontend: enforce limitsMiddleware for remote read requests #8374
Conversation
dd69102
to
edeb04e
Compare
clonedReq := *r | ||
clonedReq.query = clonedQuery | ||
return &clonedReq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels off, couldn't we just update the r.query and convert that into the result with remoteReadToMetricsQueryRequest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your comment and I also don't like much this function, but the WithXXX()
functions are expected to return a deep copy without manipulating any of the original request data (that's by contract of these functions). If we do manipulate r.query
then we're going to break the contract.
E.g. we take a similar approach in PrometheusRangeQueryRequest
too, but there we don't have a complex nested struct like prompb.Query
:
mimir/pkg/frontend/querymiddleware/model_extra.go
Lines 158 to 168 in 5571e24
func (r *PrometheusRangeQueryRequest) WithStartEnd(start int64, end int64) MetricsQueryRequest { | |
newRequest := *r | |
newRequest.start = start | |
newRequest.end = end | |
if newRequest.queryExpr != nil { | |
newRequest.minT, newRequest.maxT = decodeQueryMinMaxTime( | |
newRequest.queryExpr, newRequest.GetStart(), newRequest.GetEnd(), newRequest.GetStep(), newRequest.lookbackDelta, | |
) | |
} | |
return &newRequest | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. I've changed the code to use remoteReadToMetricsQueryRequest()
.
@@ -133,11 +133,14 @@ func (l limitsMiddleware) Do(ctx context.Context, r MetricsQueryRequest) (Respon | |||
// Clamp the time range based on the max query lookback and block retention period. | |||
blocksRetentionPeriod := validation.SmallestPositiveNonZeroDurationPerTenant(tenantIDs, l.CompactorBlocksRetentionPeriod) | |||
maxQueryLookback := validation.SmallestPositiveNonZeroDurationPerTenant(tenantIDs, l.MaxQueryLookback) | |||
maxLookback := util_math.Min(blocksRetentionPeriod, maxQueryLookback) | |||
maxLookback := util_math.Min(blocksRetentionPeriod, maxQueryLookback) // TODO bug! we should not compare zero values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: see #8388
ad9d997
to
af7fabe
Compare
5ef55f1
to
6b2300e
Compare
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
ed63e1f
to
33191d9
Compare
|
||
// Parse the request body consuming it! From now on we can't call the next http.RoundTrigger without | ||
// replacing the Body. | ||
remoteReadReq, err := unmarshalRemoteReadRequest(req.Context(), req.Body, int(req.ContentLength)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have to call req.Body.Close()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. The http.Request.Body
comment says:
// For server requests, the Request Body is always non-nil
// but will return EOF immediately when no body is present.
// The Server will close the request body. The ServeHTTP
// Handler does not need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the discussion in prometheus/prometheus#4226 it seems like query [start, end]
always includes the [hint start, hint-end]
as the query start is for initializing a Querier and the hint is for selecting actual samples.
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
// We only clamp the hints time range (and not extend it). If, for any reason, the hints start/end | ||
// time range is shorter than the query start/end range, then we manipulate only to clamp it to keep | ||
// it within the requested range. | ||
if clonedQuery.Hints != nil && clonedQuery.Hints.StartMs < start { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: now I assume that if someone provided hints, they set a sensible Hints.StartMs
. Since we don't have validation midddleware, maybe we do need a zero check
if clonedQuery.Hints != nil && clonedQuery.Hints.StartMs < start { | |
if clonedQuery.Hints != nil && clonedQuery.Hints.StartMs != 0 && clonedQuery.Hints.StartMs < start { |
Just to avoid setting StartMs > EndMs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super sure about this.
Let's forget for a moment the fact that today we're ignoring Hints
in Mimir, and let's assume we'll fix it (as I think we should do).
The reason why I think we should not consider 0
an "invalid" number is because, if hints are populated (non nil), we should just honor values are they're set. Prometheus does the same in its remote read implementation, where ReadHints
are converted into SelectHints
and then start/end from SelectHints
are just used as is (and we do the same in Mimir for instant/range queries):
https://github.com/prometheus/prometheus/blob/5efc8dd27b6e68d5102b77bc708e52c9821c5101/tsdb/querier.go#L135-L136
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved, with one nit (which I caused by taking away test for 0)
…quest Signed-off-by: Marco Pracucci <[email protected]>
What this PR does
In this PR I propose to enforce
limitsMiddleware
for remote read requests too.Before this PR,
-query-frontend.max-query-expression-size-bytes
,-query-frontend.max-total-query-length
and-querier.max-query-lookback
were not enforced for remote read requests in the query-frontend. After this PR, they're enforced in the query-frontend. If a remote read request contains only 1 out of N queries failing because of the limit, the whole remote read request will be rejected (as of today it's quite difficult to only fail 1 out of N queries).The main change in this PR is about introducing support to manipulate the remote read queries before they're passed to downstream (querier). This is needed by the
-querier.max-query-lookback
enforcement done bylimitsMiddleware
.Which issue(s) this PR fixes or relates to
N/A
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.