Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add healthchecks and prepare monitoring #95

Merged
merged 4 commits into from
Nov 9, 2022
Merged

add healthchecks and prepare monitoring #95

merged 4 commits into from
Nov 9, 2022

Conversation

wkloucek
Copy link
Contributor

@wkloucek wkloucek commented Oct 7, 2022

Description

adds healthchecks and prepares monitoring where applicable

Related Issue

Motivation and Context

How Has This Been Tested?

  • on minikube

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Technical debt
  • Tests only (no source changes)

Checklist:

@wkloucek wkloucek marked this pull request as draft October 7, 2022 04:45
@wkloucek wkloucek changed the title add healthchecks add healthchecks and monitoring Oct 27, 2022
@wkloucek wkloucek marked this pull request as ready for review October 27, 2022 10:23
Copy link
Contributor

@rhafer rhafer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look mostly ok to me. Just raising some questions regarding the debug port and the ServiceMontior

@@ -0,0 +1,15 @@
{{- if .Values.monitoring.enabled -}}
apiVersion: monitoring.coreos.com/v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this to work the Prometheus Operator needs to be installed on the Cluster, right? Shouldn't we mention that somewhere it the docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'll add this to the docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the ServiceMonitor from the chart because we should keep dependencies on non Kubernetes resources low. It can be easily added separately and I'll add this to the documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wkloucek wkloucek changed the title add healthchecks and monitoring add healthchecks and prepare monitoring Nov 7, 2022
@wkloucek wkloucek requested a review from rhafer November 7, 2022 12:59
Copy link
Contributor

@rhafer rhafer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDM and NATs don't provide the debug services yet. I guess we need different health checks for them.

charts/ocis/templates/idm/deployment.yaml Show resolved Hide resolved
charts/ocis/templates/nats/deployment.yaml Show resolved Hide resolved
@rhafer
Copy link
Contributor

rhafer commented Nov 7, 2022

@wkloucek To cleanup the commit history of this I'd like to rebase and probably squash a few commits. Is that ok for you. Otherwise I think this one is good now.

@wkloucek
Copy link
Contributor Author

wkloucek commented Nov 7, 2022

IDM and NATs don't provide the debug services yet. I guess we need different health checks for them.

IDM has a debug port with health endpoint: https://github.com/owncloud/ocis/blob/9193e6f5235761778223b41be00baa9fc3f52cd3/services/idm/pkg/server/debug/server.go#L29-L43

NATS has indeed no debug port. Which is a bug, because it has the config for it...

@rhafer
Copy link
Contributor

rhafer commented Nov 8, 2022

IDM has a debug port with health endpoint: https://github.com/owncloud/ocis/blob/9193e6f5235761778223b41be00baa9fc3f52cd3/services/idm/pkg/server/debug/server.go#L29-L43

Hm, I am pretty sure that that is dead code.
idm is constantly failing the liveness probe.

NATS has indeed no debug port. Which is a bug, because it has the config for it...

Yeah. There are bugs in both services with the debug endpoints. I'll open issues for both. For the time being I thing the TCP check for IDM is much more useful than checking a HTTP port for a service that does not even serve HTTP.

IDM and NAT don't provide any debug service/port so use a simple TCP
check on the main port.
Notifications and Audit do currently not provide a debug port either. As
they are not listening on any port by default we don't currently implement
a useful liveness probe for them.
Copy link
Contributor Author

@wkloucek wkloucek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't approve because this is my PR, but the changes by @rhafer look good to me 👍

Copy link
Contributor

@rhafer rhafer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for the stuff @wkloucek did ;-)

@rhafer rhafer merged commit 0d9672c into master Nov 9, 2022
@delete-merged-branch delete-merged-branch bot deleted the healthcheck branch November 9, 2022 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants