Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Frontend: dynamic horizontal query sharding #5658

Merged
merged 13 commits into from
Sep 14, 2022

Conversation

pedro-stanaka
Copy link
Contributor

@pedro-stanaka pedro-stanaka commented Aug 30, 2022

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Summary

This is a possible solution for the issue #5617.
Closes #5617.

Changes

  • Adds new CLI parameter for QFE to allow a lower boundary for query horizontal split
  • When queries have a duration above this threshold, they will be split into either a minimum number of horizontal shards or an upper bound interval (for longer queries).

Verification

The modification was only on the Query Range middleware, I introduced more test cases to test the new behavior.
I also added more validation for new parameters and tests to cover it.

@pedro-stanaka pedro-stanaka force-pushed the feat/dynamic-horizontal-sharding branch 2 times, most recently from afabd95 to bc57bc6 Compare August 30, 2022 09:41
@pull-request-size pull-request-size bot added size/L and removed size/M labels Aug 30, 2022
@pedro-stanaka pedro-stanaka changed the title Adding new parameter for more dynamic horizontal query sharding Query Frontend: dynamic horizontal query sharding Aug 30, 2022
@pedro-stanaka pedro-stanaka force-pushed the feat/dynamic-horizontal-sharding branch from 941ae32 to 9eddb3d Compare August 30, 2022 19:33
CHANGELOG.md Outdated Show resolved Hide resolved
@pedro-stanaka pedro-stanaka marked this pull request as ready for review August 31, 2022 07:03
Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, I think we just need to polish validation a bit better

cmd/thanos/query_frontend.go Outdated Show resolved Hide resolved
cmd/thanos/query_frontend.go Outdated Show resolved Hide resolved
pkg/queryfrontend/config.go Outdated Show resolved Hide resolved
@pedro-stanaka pedro-stanaka force-pushed the feat/dynamic-horizontal-sharding branch 3 times, most recently from 41171f2 to f4c6fb8 Compare September 1, 2022 07:38
Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more documentation nits, everything else looks good 👍

cmd/thanos/query_frontend.go Outdated Show resolved Hide resolved
cmd/thanos/query_frontend.go Outdated Show resolved Hide resolved
cmd/thanos/query_frontend.go Outdated Show resolved Hide resolved
@@ -242,14 +245,25 @@ type LabelsConfig struct {
// Validate a fully initialized config.
func (cfg *Config) Validate() error {
if cfg.QueryRangeConfig.ResultsCacheConfig != nil {
if cfg.QueryRangeConfig.SplitQueriesByInterval <= 0 {
return errors.New("split queries interval should be greater than 0 when caching is enabled")
if cfg.QueryRangeConfig.SplitQueriesByInterval <= 0 && !cfg.isDynamicSplitSet() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change might be out of scope for this PR, but I wonder why we even have this constraint. @yeya24 any ideas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was inherited from Cortex (only by looking at git history). 17343c16

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we don't allow negative split intervals.

@pedro-stanaka pedro-stanaka force-pushed the feat/dynamic-horizontal-sharding branch from bd6417e to 9484d0b Compare September 8, 2022 15:17
fpetkovski
fpetkovski previously approved these changes Sep 9, 2022
Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome feature, lgtm 👍

@yeya24 any other comments?

CHANGELOG.md Outdated
@@ -17,6 +17,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
### Added
* [#5654](https://github.com/thanos-io/thanos/pull/5654) Query: add `--grpc-compression` flag that controls the compression used in gRPC client. With the flag it is now possible to compress the traffic between Query and StoreAPI nodes - you get lower network usage in exchange for a bit higher CPU/RAM usage.
- [#5650](https://github.com/thanos-io/thanos/pull/5650) Query Frontend: Add sharded queries metrics.
- [#5658](https://github.com/thanos-io/thanos/pull/5658) Query Frontend: Introduce new parameters to implement more dynamic horizontal query splitting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mention the new flags name here?

"One should also set query-range.split-min-horizontal-shards to a value greater than 1 to enable splitting.").
Default("0").DurationVar(&cfg.QueryRangeConfig.MinQuerySplitInterval)

cmd.Flag("query-range.max-split-interval", "Split query range bellow this interval in query-range.horizontal-shards. Queries with a range longer than this value will be split in multiple requests of this length.").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type below

if cfg.QueryRangeConfig.SplitQueriesByInterval <= 0 {
return errors.New("split queries interval should be greater than 0 when caching is enabled")
if cfg.QueryRangeConfig.SplitQueriesByInterval <= 0 && !cfg.isDynamicSplitSet() {
return errors.New("split queries or split threshold interval should be greater than 0 when caching is enabled")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove 1 space?


if cfg.isDynamicSplitSet() {
err := cfg.validateDynamicSplitParams()
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use if err := xxx; err != nil {} pattern

handlerFunc: promqlResults,
codec: queryRangeCodec,
splitInterval: 1 * time.Hour,
expected: 2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is the same as the same at line 279?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you!

}

queryInterval := time.Duration(r.GetEnd()-r.GetStart()) * time.Millisecond
// if the query is multiple of max interval, we use the max interval to split.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's capitalize the comment.

return config.MaxQuerySplitInterval
}

// if the query duration is less than max interval, we split it equally in HorizontalShards.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

}

// if the query duration is less than max interval, we split it equally in HorizontalShards.
return time.Duration(queryInterval.Milliseconds()/config.HorizontalShards) * time.Millisecond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where MinQuerySplitInterval is used here. We need to also ensure the query interval is larger than MaxQuerySplitInterval IIUC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MinQuerySplitInterval is a drop-in replacement for SplitQueriesByInterval, saying just when we start splitting queries. We use it in newQueryRangeTripperware (line 182) to decide whether to send split requests or to send the original request.

Regarding the MaxQuerySplitInterval, we are checking that on line 247 to see if the query interval is at least twice the size of the MaxQuerySplitInterval, if so we use that as our split interval.

Copy link
Contributor

@yeya24 yeya24 Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MinQuerySplitInterval is a drop-in replacement for SplitQueriesByInterval, saying just when we start splitting queries. We use it in newQueryRangeTripperware (line 182) to decide whether to send split requests or to send the original request.

Where do we ensure that the query range is larger than the MinQuerySplitInterval to be split horizontally? For this code, I can see that even a small range query will be sharded by HorizontalShards times.
Maybe I misunderstand the initial design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right. I added new test case and an if statement.

@pedro-stanaka pedro-stanaka force-pushed the feat/dynamic-horizontal-sharding branch from 3ebf35b to 72f13d7 Compare September 13, 2022 08:03
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yeya24 yeya24 merged commit 0392dd7 into thanos-io:main Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query frontend: dynamic horizontal sharding
3 participants