-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement stateless mode for Thanos Ruler #4250
Conversation
} | ||
|
||
//nolint:staticcheck | ||
a.w.bufPool.Put(buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SA6002: argument should be pointer-like to avoid allocations
(at-me in a reply with help
or ignore
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid work!
This is actually super close to be done 💪🏽 🚀 Thanks!
cmd/thanos/config.go
Outdated
} | ||
|
||
func (rc *ruleRWConfig) registerFlag(cmd extflag.FlagClause) *ruleRWConfig { | ||
cmd.Flag("remote-write", "If true, directs ruler to remote-write evaluated samples to the server configured by 'remote-write.config'."). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not check if config is provided and only then enable remote write? ;p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important! User will expect remote write can be added to the TSDB ruler also.. We need to make sure that is not the case, and if you provide remote write it will make ruler stateless (except the WAL buffer)
pkg/rules/remotewrite/remotewrite.go
Outdated
if err != nil { | ||
return nil, fmt.Errorf("failed applying config to remote storage: %w", err) | ||
} | ||
fanoutStorage := storage.NewFanout(logger, walStore, remoteStore) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
series []record.RefSeries | ||
} | ||
|
||
func (c *walDataCollector) Append(samples []record.RefSample) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
U1000: func (*walDataCollector).Append is unused
(at-me in a reply with help
or ignore
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just minor nits, tests, fixing build errors and metrics (:
cmd/thanos/rule.go
Outdated
@@ -75,6 +78,9 @@ type ruleConfig struct { | |||
alertQueryURL *url.URL | |||
alertRelabelConfigYAML []byte | |||
|
|||
rwConfig ruleRWConfig | |||
rwConfigYAML []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this?
cmd/thanos/rule.go
Outdated
@@ -75,6 +78,9 @@ type ruleConfig struct { | |||
alertQueryURL *url.URL | |||
alertRelabelConfigYAML []byte | |||
|
|||
rwConfig ruleRWConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rwConfig ruleRWConfig | |
rwConfig *extflag.PathOrContent |
cmd/thanos/rule.go
Outdated
@@ -165,6 +172,14 @@ func registerRule(app *extkingpin.App) { | |||
return errors.New("--query/--query.sd-files and --query.config* parameters cannot be defined at the same time") | |||
} | |||
|
|||
// Parse and check remote-write config and enable stateless mode for ruler. | |||
if conf.rwConfig.configPath != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the user passes empty configuration?
cmd/thanos/rule.go
Outdated
@@ -165,6 +172,14 @@ func registerRule(app *extkingpin.App) { | |||
return errors.New("--query/--query.sd-files and --query.config* parameters cannot be defined at the same time") | |||
} | |||
|
|||
// Parse and check remote-write config and enable stateless mode for ruler. | |||
if conf.rwConfig.configPath != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this
cmd/thanos/rule.go
Outdated
} | ||
var rwCfg remotewrite.Config | ||
if len(conf.rwConfigYAML) == 0 { | ||
return errors.New("no --remote-write.config was given") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return errors.New("no --remote-write.config was given") | |
return errors.New("empty --remote-write.config or empty file via --remote-write.config-file was given") |
pkg/rules/remotewrite/remotewrite.go
Outdated
GlobalConfig: config.DefaultGlobalConfig, | ||
RemoteWriteConfigs: []*config.RemoteWriteConfig{rwConfig.RemoteStore}, | ||
}); err != nil { | ||
return nil, fmt.Errorf("failed applying config to remote storage: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("failed applying config to remote storage: %w", err) | |
return nil, errros.Wrapf(err, "applying config to remote storage") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Thanos still prefering errors.Wrapf
over fmt.Errorf
with %w
?
pkg/rules/remotewrite/wal.go
Outdated
func newStorageMetrics(r prometheus.Registerer) *storageMetrics { | ||
m := storageMetrics{r: r} | ||
m.numActiveSeries = prometheus.NewGauge(prometheus.GaugeOpts{ | ||
Name: "agent_wal_storage_active_series", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name: "agent_wal_storage_active_series", | |
Name: "wal_storage_active_series", |
Let's start with wal...
? Let's remove agent
if err := removeLockfileIfAny(logger, conf.dataDir); err != nil { | ||
return errors.Wrap(err, "remove storage lock files") | ||
} | ||
if len(rwCfgYAML) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case though, an empty --remote-write.config
(or --remote-write.config-file
)will fall back to using the TSDB-backed ruler. This is because len(rwCfgYAML) == 0
either means:
- The flag was provided but it's an empty file or empty string OR
- The flag was not provided at all.
I couldn't think of a reliable way to distinguish between them. I've also documented this behavior in the flag help, but I don't know if it is something we want. An alternative would be to provide a separate flag for enabling stateless mode (e.g --remote.write
or --remote-write.enabled
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fix the conflict and linting issue.
} | ||
|
||
//nolint:staticcheck | ||
samplesPool.Put(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SA6002: argument should be pointer-like to avoid allocations
(at-me in a reply with help
or ignore
)
@idoqo feels like we are very close to being ready to rock & roll 🚀 just a couple of linters & conflicts to fix |
@idoqo any updates? |
255f824
to
bcf445f
Compare
8002078
to
81a2b2e
Compare
Signed-off-by: Michael Okoko <[email protected]>
Signed-off-by: Michael Okoko <[email protected]>
Signed-off-by: Michael Okoko <[email protected]>
Signed-off-by: Michael Okoko <[email protected]>
Signed-off-by: Michael Okoko <[email protected]>
Signed-off-by: Michael Okoko <[email protected]>
Signed-off-by: Michael Okoko <[email protected]>
Signed-off-by: Michael Okoko <[email protected]>
Signed-off-by: Michael Okoko <[email protected]>
This removes the need to pass a separate `remote-write` flag to ruler to enable stateless mode. Instead, we now check if a remote-write config is provided and automatically enables stateless mode based off that. Ruler test is also cleaned up to remove unnecessary tests (i.e those that have been performed by other e2e suites). Signed-off-by: Michael Okoko <[email protected]>
Signed-off-by: Michael Okoko <[email protected]>
Signed-off-by: Michael Okoko <[email protected]>
Signed-off-by: Michael Okoko <[email protected]>
Signed-off-by: Michael Okoko <[email protected]>
Signed-off-by: Michael Okoko <[email protected]>
Signed-off-by: Michael Okoko <[email protected]>
Signed-off-by: Michael Okoko <[email protected]>
81a2b2e
to
79b06d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw I got panic at line 559. In the case of using remote write storage, db
will be nil and cause panic later.
tsdbStore := store.NewTSDBStore(logger, db, component.Rule, conf.lset)
We need to implement a TSDBReader
based on the local Wal store and the remote storage as well. However, I don't think the two stores have real Queryable
implementation so currently it is still not working.
ecbde08
to
86522b1
Compare
Signed-off-by: Michael Okoko <[email protected]>
86522b1
to
6b0612c
Compare
@idoqo The failling E2E test is related to your changes. Can you take a look? Also please resolve the conflict. |
@@ -522,7 +556,7 @@ func runRule( | |||
) | |||
|
|||
// Start gRPC server. | |||
{ | |||
if db != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I missed it last time. Even if it is stateless mode, we still need to start the gRPC server because we need the prober and the gRPC
rules feature, we can just remove the StoreServer in this case.
Example code:
// Start gRPC server.
tlsCfg, err := tls.NewServerConfig(log.With(logger, "protocol", "gRPC"), conf.grpc.tlsSrvCert, conf.grpc.tlsSrvKey, conf.grpc.tlsSrvClientCA)
if err != nil {
return errors.Wrap(err, "setup gRPC server")
}
options := []grpcserver.Option{
grpcserver.WithServer(thanosrules.RegisterRulesServer(ruleMgr)),
grpcserver.WithListen(conf.grpc.bindAddress),
grpcserver.WithGracePeriod(time.Duration(conf.grpc.gracePeriod)),
grpcserver.WithTLSConfig(tlsCfg),
}
if db != nil {
tsdbStore := store.NewTSDBStore(logger, db, component.Rule, conf.lset)
options = append(options, grpcserver.WithServer(store.RegisterStoreServer(tsdbStore)))
}
// TODO: Add rules API implementation when ready.
s := grpcserver.New(logger, reg, tracer, grpcLogOpts, tagOpts, comp, grpcProbe, options...)
g.Add(func() error {
statusProber.Ready()
return s.ListenAndServe()
}, func(err error) {
statusProber.NotReady(err)
s.Shutdown(err)
})
Hi @idoqo, are you still willing to finish this pr? Should be close to merge I think. If you are busy with other work then I am happy to take over. |
* import wal package from grafana/agent (with string interning disabled) Signed-off-by: Michael Okoko <[email protected]> * Set up remote-write config and test skeleton Signed-off-by: Michael Okoko <[email protected]> * Setup fanout and related storages for stateless ruler Signed-off-by: Michael Okoko <[email protected]> * Optionally run ruler in stateless mode Signed-off-by: Michael Okoko <[email protected]> * Set up tests and implementations for configuring remote-write for ruler Signed-off-by: Michael Okoko <[email protected]> * Implement stub querier for WAL storage to fix nil pointer error Signed-off-by: Michael Okoko <[email protected]> * Setup e2e test for stateless ruler Signed-off-by: Michael Okoko <[email protected]> * Add copied code commentary to remotewrite packages Signed-off-by: Michael Okoko <[email protected]> * Use static addresses for am and querier Signed-off-by: Michael Okoko <[email protected]> * Remove need for separate remote-write flag for stateless ruler This removes the need to pass a separate `remote-write` flag to ruler to enable stateless mode. Instead, we now check if a remote-write config is provided and automatically enables stateless mode based off that. Ruler test is also cleaned up to remove unnecessary tests (i.e those that have been performed by other e2e suites). Signed-off-by: Michael Okoko <[email protected]> * Generate docs for stateless ruler flags and fix tests Signed-off-by: Michael Okoko <[email protected]> * Use promauto for prometheus primitives Signed-off-by: Michael Okoko <[email protected]> * Group imports and satisfy go-lint Signed-off-by: Michael Okoko <[email protected]> * Always return empty series set from WAL storage Signed-off-by: Michael Okoko <[email protected]> * re-generate rule documentation Signed-off-by: Michael Okoko <[email protected]> * copyright headers to satisfy golint Signed-off-by: Michael Okoko <[email protected]> * Rename wal storage metrics Signed-off-by: Michael Okoko <[email protected]> * Use Prometheus' remote write config instead of rolling another Signed-off-by: Michael Okoko <[email protected]> * Fix E2E tests Signed-off-by: Ben Ye <[email protected]> * add changelog Signed-off-by: yeya24 <[email protected]> * remove wal related tests Signed-off-by: Ben Ye <[email protected]> * fix e2e Signed-off-by: Ben Ye <[email protected]> * remove unused structs in tests Signed-off-by: Ben Ye <[email protected]> * add licence header Signed-off-by: Ben Ye <[email protected]> * use upstream agent package Signed-off-by: Ben Ye <[email protected]> * address comments Signed-off-by: Ben Ye <[email protected]> Co-authored-by: Michael Okoko <[email protected]>
Thanks for the work. Close it in favor of #4731 |
This implements the stateless mode for ruler described in the Scalable Rule Storage proposal
Changes
Verification