-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
database_observability: report health of component and collectors #2392
Conversation
6e00e77
to
b941a40
Compare
@@ -127,7 +132,7 @@ func (c *QuerySample) fetchQuerySamples(ctx context.Context) error { | |||
} | |||
|
|||
if strings.HasSuffix(sampleText, "...") { | |||
level.Info(c.logger).Log("msg", "skipping parsing truncated query", "digest", digest) | |||
level.Debug(c.logger).Log("msg", "skipping parsing truncated query", "digest", digest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by remove noisy info log
if len(schemas) == 0 { | ||
level.Info(c.logger).Log("msg", "no schema detected from information_schema.schemata") | ||
return nil | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by: log if no schema is detected
b941a40
to
dc57779
Compare
Report unhealthy in case of errors when starting up the collectors or of any collector is stopped during operations.
dc57779
to
2d5c5de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I just suggested a different approach to give the collectors more flexibility on their health status but feel free to ignore
Start(context.Context) error | ||
Stopped() bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, you might have collectors that can be considered unhealthy but are still running. A different approach to support this would be to have a CurrentHealth function in the collector interface that returns the health object. Then you would not need the healthErr attribute anymore, you would just call CurrentHealth on all the collectors in the CurrentHealth function of the component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's a great point. I wanted to start simple for now, as collectors are anyway not resilient at all (they'll stop as soon as any error is hit). Agree that in the future we might want to delegate the logic to the collectors themselves.
* update changelog for rc (#2360) * update changelog for rc * update changelog for rc * update changelog for rc (#2361) * update version (#2362) * update changelog for rc (#2360) * update changelog for rc * update changelog for rc Signed-off-by: matt durham <[email protected]> * fix conversion * Fix changelog main (#2364) * update version * Fix changelog * update the image version to work with the given example (#2358) * docs: fixed kafka config example (#2359) Example shows `loki.source.kafka "local"` pointing to `loki.relabel.kafka.receiver`. This leads to no new label being added. Correct example should have the kafka source pointing directly to `loki.write.local.receiver` * feat(helm): add the ability to deploy extra manifest files (#2347) * feat(helm): add the ability to deploy extra manifest files * docs(helm): run helm-docs * ci(helm): add tests * Update wal queue tls (#2363) * add tls to wal * add alloy config * update version * Add support for TLS doc. * Add changelog. * fix import order * add support and doc for round robin. * fix conversion * Update docs/sources/reference/components/prometheus/prometheus.write.queue.md Co-authored-by: Clayton Cornell <[email protected]> * Add test * fix merge * Update internal/component/prometheus/write/queue/types.go Co-authored-by: William Dumont <[email protected]> --------- Co-authored-by: Clayton Cornell <[email protected]> Co-authored-by: William Dumont <[email protected]> * #229 Add OpenTelemetry Collector Server Auth Extensions to Receivers (#2203) * Work on adding auth so far * Cleanup * Made a ton of progress * Fix test fails? * Refactor * Add auth blocks to implementing extensions * Refactor to use feature flag * Comments * Cleanup * Spacing * Update docs * Update CHANGELOG * Last auth extension missing * We also need grpc auth * Fix opencensus docs * Fix extra comment * Update comment with findings * Properly fix merge conflict * Save file * Spelling error * That has been released now * Add auth support to influxdb receiver * Fix failing auth test/MAIL * Comment cleanup * MAIL for documentation * docs MAIL * MAIL * Move from Auth to Authentication * Update triton-go dependency to avoid embedded RSA key (#2380) * Fix examples for filter and transform processors (#2379) * fix examples filter and transform processors * remove unecessary docs about escaping strings and backticks * fix(loki.secretfilter): Fix partial masking for short secrets and support multiple allowlists per rule (#2320) * Fix partial masking bug and support new allowlist format * Add docs and changelog * Update docs * Add comments * Add comments * Minor docs update * Add more tests * Change criteria for partial redaction * Changes to partial masking rules * Fix comment location * Clarify usage of secret types * Clarify usage of secret types * Update docs/sources/reference/components/loki/loki.secretfilter.md Co-authored-by: Clayton Cornell <[email protected]> * Suggestions * Suggestions --------- Co-authored-by: Clayton Cornell <[email protected]> * Fix only run on fork guard (#2378) * Fix only run on fork guard The previous guard fails because `github.repository` resolves to the base repository on `pull_request` events. * Fix syntax * Fix relabel processed bug (#2394) * Fix issue where alloy_prometheus_relabel_metrics_processed was not being incremented. * Add unit tests * Update WAL to version that supports v2. (#2397) * Update WAL to version that supports v2. * Update WAL to version that supports v2. * Add samples check. * Clean up Alloy component docs (#2387) * First pass at cleanup, pretty tables, sort lists * Sort content, add badge * Fix link * Set link URL correctly * Still fxing link targets * One more tidy pass * database_observability: report health of component and collectors (#2392) Report unhealthy in case of errors when starting up the collectors or of any collector is stopped during operations. * update for rc.1 (#2401) * Update version. * fix version * fix version --------- Signed-off-by: matt durham <[email protected]> Co-authored-by: Adam ABICHOU <[email protected]> Co-authored-by: Jay Clifford <[email protected]> Co-authored-by: dbluxo <[email protected]> Co-authored-by: Clayton Cornell <[email protected]> Co-authored-by: William Dumont <[email protected]> Co-authored-by: Aidan Leuck <[email protected]> Co-authored-by: Sam DeHaan <[email protected]> Co-authored-by: Romain Gaillard <[email protected]> Co-authored-by: Jack Baldry <[email protected]> Co-authored-by: Cristian Greco <[email protected]>
PR Description
Report unhealthy in case of errors when starting up the collectors or of any collector is stopped during operations.
Which issue(s) this PR fixes
n.a.
Notes to the Reviewer
PR Checklist