Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query-frontend: enforce limitsMiddleware for remote read requests #8374

Merged

Conversation

pracucci
Copy link
Collaborator

@pracucci pracucci commented Jun 14, 2024

What this PR does

In this PR I propose to enforce limitsMiddleware for remote read requests too.

Before this PR, -query-frontend.max-query-expression-size-bytes, -query-frontend.max-total-query-length and -querier.max-query-lookback were not enforced for remote read requests in the query-frontend. After this PR, they're enforced in the query-frontend. If a remote read request contains only 1 out of N queries failing because of the limit, the whole remote read request will be rejected (as of today it's quite difficult to only fail 1 out of N queries).

The main change in this PR is about introducing support to manipulate the remote read queries before they're passed to downstream (querier). This is needed by the -querier.max-query-lookback enforcement done by limitsMiddleware.

Which issue(s) this PR fixes or relates to

N/A

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@pracucci pracucci requested a review from krajorama June 14, 2024 11:04
@pracucci pracucci changed the title Draft idea of how to integrate limitsMiddleware for the remote read Query-frontend: enforce limitsMiddleware for remote read requests Jun 14, 2024
@pracucci pracucci force-pushed the enforce-max-query-expression-size-bytes-for-remote-read branch 4 times, most recently from dd69102 to edeb04e Compare June 17, 2024 07:04
Comment on lines 313 to 315
clonedReq := *r
clonedReq.query = clonedQuery
return &clonedReq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels off, couldn't we just update the r.query and convert that into the result with remoteReadToMetricsQueryRequest

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your comment and I also don't like much this function, but the WithXXX() functions are expected to return a deep copy without manipulating any of the original request data (that's by contract of these functions). If we do manipulate r.query then we're going to break the contract.

E.g. we take a similar approach in PrometheusRangeQueryRequest too, but there we don't have a complex nested struct like prompb.Query:

func (r *PrometheusRangeQueryRequest) WithStartEnd(start int64, end int64) MetricsQueryRequest {
newRequest := *r
newRequest.start = start
newRequest.end = end
if newRequest.queryExpr != nil {
newRequest.minT, newRequest.maxT = decodeQueryMinMaxTime(
newRequest.queryExpr, newRequest.GetStart(), newRequest.GetEnd(), newRequest.GetStep(), newRequest.lookbackDelta,
)
}
return &newRequest
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. I've changed the code to use remoteReadToMetricsQueryRequest().

@@ -133,11 +133,14 @@ func (l limitsMiddleware) Do(ctx context.Context, r MetricsQueryRequest) (Respon
// Clamp the time range based on the max query lookback and block retention period.
blocksRetentionPeriod := validation.SmallestPositiveNonZeroDurationPerTenant(tenantIDs, l.CompactorBlocksRetentionPeriod)
maxQueryLookback := validation.SmallestPositiveNonZeroDurationPerTenant(tenantIDs, l.MaxQueryLookback)
maxLookback := util_math.Min(blocksRetentionPeriod, maxQueryLookback)
maxLookback := util_math.Min(blocksRetentionPeriod, maxQueryLookback) // TODO bug! we should not compare zero values
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: see #8388

@pracucci pracucci force-pushed the enforce-max-query-expression-size-bytes-for-remote-read branch 2 times, most recently from ad9d997 to af7fabe Compare June 17, 2024 08:38
@pracucci pracucci force-pushed the enforce-max-query-expression-size-bytes-for-remote-read branch from 5ef55f1 to 6b2300e Compare June 17, 2024 10:33
@pracucci pracucci marked this pull request as ready for review June 17, 2024 11:13
@pracucci pracucci requested a review from a team as a code owner June 17, 2024 11:13
@pracucci pracucci force-pushed the enforce-max-query-expression-size-bytes-for-remote-read branch from ed63e1f to 33191d9 Compare June 17, 2024 11:32

// Parse the request body consuming it! From now on we can't call the next http.RoundTrigger without
// replacing the Body.
remoteReadReq, err := unmarshalRemoteReadRequest(req.Context(), req.Body, int(req.ContentLength))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have to call req.Body.Close() ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The http.Request.Body comment says:

	// For server requests, the Request Body is always non-nil
	// but will return EOF immediately when no body is present.
	// The Server will close the request body. The ServeHTTP
	// Handler does not need to.

Copy link
Contributor

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion in prometheus/prometheus#4226 it seems like query [start, end] always includes the [hint start, hint-end] as the query start is for initializing a Querier and the hint is for selecting actual samples.

// We only clamp the hints time range (and not extend it). If, for any reason, the hints start/end
// time range is shorter than the query start/end range, then we manipulate only to clamp it to keep
// it within the requested range.
if clonedQuery.Hints != nil && clonedQuery.Hints.StartMs < start {
Copy link
Contributor

@krajorama krajorama Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: now I assume that if someone provided hints, they set a sensible Hints.StartMs. Since we don't have validation midddleware, maybe we do need a zero check

Suggested change
if clonedQuery.Hints != nil && clonedQuery.Hints.StartMs < start {
if clonedQuery.Hints != nil && clonedQuery.Hints.StartMs != 0 && clonedQuery.Hints.StartMs < start {

Just to avoid setting StartMs > EndMs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super sure about this.

Let's forget for a moment the fact that today we're ignoring Hints in Mimir, and let's assume we'll fix it (as I think we should do).

The reason why I think we should not consider 0 an "invalid" number is because, if hints are populated (non nil), we should just honor values are they're set. Prometheus does the same in its remote read implementation, where ReadHints are converted into SelectHints  and then start/end from SelectHints are just used as is (and we do the same in Mimir for instant/range queries):
https://github.com/prometheus/prometheus/blob/5efc8dd27b6e68d5102b77bc708e52c9821c5101/tsdb/querier.go#L135-L136

Copy link
Contributor

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved, with one nit (which I caused by taking away test for 0)

@pracucci pracucci enabled auto-merge (squash) June 17, 2024 15:17
@pracucci pracucci merged commit 5893053 into main Jun 17, 2024
29 checks passed
@pracucci pracucci deleted the enforce-max-query-expression-size-bytes-for-remote-read branch June 17, 2024 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants