Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database_observability: report health of component and collectors #2392

Merged
merged 2 commits into from
Jan 14, 2025

Conversation

cristiangreco
Copy link
Collaborator

@cristiangreco cristiangreco commented Jan 13, 2025

PR Description

Report unhealthy in case of errors when starting up the collectors or of any collector is stopped during operations.

Which issue(s) this PR fixes

n.a.

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

@cristiangreco cristiangreco force-pushed the cristian/dbo11y-components-health branch from 6e00e77 to b941a40 Compare January 13, 2025 11:31
@@ -127,7 +132,7 @@ func (c *QuerySample) fetchQuerySamples(ctx context.Context) error {
}

if strings.HasSuffix(sampleText, "...") {
level.Info(c.logger).Log("msg", "skipping parsing truncated query", "digest", digest)
level.Debug(c.logger).Log("msg", "skipping parsing truncated query", "digest", digest)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by remove noisy info log

Comment on lines +174 to +178
if len(schemas) == 0 {
level.Info(c.logger).Log("msg", "no schema detected from information_schema.schemata")
return nil
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by: log if no schema is detected

@cristiangreco cristiangreco force-pushed the cristian/dbo11y-components-health branch from b941a40 to dc57779 Compare January 13, 2025 11:36
Report unhealthy in case of errors when starting up the collectors or
of any collector is stopped during operations.
@cristiangreco cristiangreco force-pushed the cristian/dbo11y-components-health branch from dc57779 to 2d5c5de Compare January 13, 2025 11:39
@cristiangreco cristiangreco marked this pull request as ready for review January 13, 2025 11:49
@cristiangreco cristiangreco requested review from matthewnolf and a team as code owners January 13, 2025 11:49
Copy link
Contributor

@wildum wildum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just suggested a different approach to give the collectors more flexibility on their health status but feel free to ignore

Start(context.Context) error
Stopped() bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, you might have collectors that can be considered unhealthy but are still running. A different approach to support this would be to have a CurrentHealth function in the collector interface that returns the health object. Then you would not need the healthErr attribute anymore, you would just call CurrentHealth on all the collectors in the CurrentHealth function of the component.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's a great point. I wanted to start simple for now, as collectors are anyway not resilient at all (they'll stop as soon as any error is hit). Agree that in the future we might want to delegate the logic to the collectors themselves.

@cristiangreco cristiangreco enabled auto-merge (squash) January 14, 2025 08:30
@cristiangreco cristiangreco merged commit 55d952e into main Jan 14, 2025
18 checks passed
@cristiangreco cristiangreco deleted the cristian/dbo11y-components-health branch January 14, 2025 08:36
mattdurham added a commit that referenced this pull request Jan 14, 2025
* update changelog for rc (#2360)

* update changelog for rc

* update changelog for rc

* update changelog for rc (#2361)

* update version (#2362)

* update changelog for rc (#2360)

* update changelog for rc

* update changelog for rc

Signed-off-by: matt durham <[email protected]>

* fix conversion

* Fix changelog main (#2364)

* update version

* Fix changelog

* update the image version to work with the given example (#2358)

* docs: fixed kafka config example (#2359)

Example shows `loki.source.kafka "local"` pointing to `loki.relabel.kafka.receiver`. This leads to no new label being added. Correct example should have the kafka source pointing directly to `loki.write.local.receiver`

* feat(helm): add the ability to deploy extra manifest files (#2347)

* feat(helm): add the ability to deploy extra manifest files

* docs(helm): run helm-docs

* ci(helm): add tests

* Update wal queue tls (#2363)

* add tls to wal

* add alloy config

* update version

* Add support for TLS doc.

* Add changelog.

* fix import order

* add support and doc for round robin.

* fix conversion

* Update docs/sources/reference/components/prometheus/prometheus.write.queue.md

Co-authored-by: Clayton Cornell <[email protected]>

* Add test

* fix merge

* Update internal/component/prometheus/write/queue/types.go

Co-authored-by: William Dumont <[email protected]>

---------

Co-authored-by: Clayton Cornell <[email protected]>
Co-authored-by: William Dumont <[email protected]>

* #229 Add OpenTelemetry Collector Server Auth Extensions to Receivers (#2203)

* Work on adding auth so far

* Cleanup

* Made a ton of progress

* Fix test fails?

* Refactor

* Add auth blocks to implementing extensions

* Refactor to use feature flag

* Comments

* Cleanup

* Spacing

* Update docs

* Update CHANGELOG

* Last auth extension missing

* We also need grpc auth

* Fix opencensus docs

* Fix extra comment

* Update comment with findings

* Properly fix merge conflict

* Save file

* Spelling error

* That has been released now

* Add auth support to influxdb receiver

* Fix failing auth test/MAIL

* Comment cleanup

* MAIL for documentation

* docs MAIL

* MAIL

* Move from Auth to Authentication

* Update triton-go dependency to avoid embedded RSA key (#2380)

* Fix examples for filter and transform processors (#2379)

* fix examples filter and transform processors

* remove unecessary docs about escaping strings and backticks

* fix(loki.secretfilter): Fix partial masking for short secrets and support multiple allowlists per rule (#2320)

* Fix partial masking bug and support new allowlist format

* Add docs and changelog

* Update docs

* Add comments

* Add comments

* Minor docs update

* Add more tests

* Change criteria for partial redaction

* Changes to partial masking rules

* Fix comment location

* Clarify usage of secret types

* Clarify usage of secret types

* Update docs/sources/reference/components/loki/loki.secretfilter.md

Co-authored-by: Clayton Cornell <[email protected]>

* Suggestions

* Suggestions

---------

Co-authored-by: Clayton Cornell <[email protected]>

* Fix only run on fork guard (#2378)

* Fix only run on fork guard

The previous guard fails because `github.repository` resolves to the base repository on `pull_request` events.

* Fix syntax

* Fix relabel processed bug (#2394)

* Fix issue where alloy_prometheus_relabel_metrics_processed was not being incremented.

* Add unit tests

* Update WAL to version that supports v2. (#2397)

* Update WAL to version that supports v2.

* Update WAL to version that supports v2.

* Add samples check.

* Clean up Alloy component docs (#2387)

* First pass at cleanup, pretty tables, sort lists

* Sort content, add badge

* Fix link

* Set link URL correctly

* Still fxing link targets

* One more tidy pass

* database_observability: report health of component and collectors (#2392)

Report unhealthy in case of errors when starting up the collectors or
of any collector is stopped during operations.

* update for rc.1 (#2401)

* Update version.

* fix version

* fix version

---------

Signed-off-by: matt durham <[email protected]>
Co-authored-by: Adam ABICHOU <[email protected]>
Co-authored-by: Jay Clifford <[email protected]>
Co-authored-by: dbluxo <[email protected]>
Co-authored-by: Clayton Cornell <[email protected]>
Co-authored-by: William Dumont <[email protected]>
Co-authored-by: Aidan Leuck <[email protected]>
Co-authored-by: Sam DeHaan <[email protected]>
Co-authored-by: Romain Gaillard <[email protected]>
Co-authored-by: Jack Baldry <[email protected]>
Co-authored-by: Cristian Greco <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants