Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add internal monitoring server for metrics and profiling #3500

Merged
merged 17 commits into from
Oct 19, 2023
Merged

Conversation

enekofb
Copy link
Contributor

@enekofb enekofb commented Oct 16, 2023

Closes #3228

What changed?

  • Do not exposed profiling endpoint in the main server
  • Created monitoring server with metrics and profiling endpoint
  • Monitoring remains private
  • refactored unit tests to use mocked httpserver for less flaky tests

Why was this change made?

We wanted to be able to release profiling feature but we could not cause it was in the same public endpoint as the API.

How was this change implemented?

Refactored the existing metrics server to be more general for monitoring purposes. Initially called management server but renamed to monitoring server as the term is already overloaded in our context.
Refactored the configuration around this also.

How did you validate the change?

  • Explain how a reviewer can verify the change themselves

Tilt:

  • run the app with default values that it has monitoring and metrics server by default. If you want to enable profiling, do it in your values.
  • then try to consume the /debug/pprof endpoint from the main server, you should see an error (meaning that is not publicly exposed)
Screenshot 2023-10-17 at 15 40 25 - then go and expose the monitoring server (running in port 8080 by default) and consume

/metrics endpoint

Screenshot 2023-10-17 at 15 33 34

/debug/pprof endpoint

Screenshot 2023-10-17 at 15 33 43
  • Integration tests -- what is covered, what cannot be covered;
    or, explain why there are no new tests

this test is border-line between integration and unit https://github.com/weaveworks/weave-gitops-
enterprise/pull/3500/files#diff-c426c36708aa75ac5531273cbf747c1cdda4953cee6346db63de5e392e6fb87c

  • Unit tests -- what is covered, what cannot be covered; are
    there tests that fail without the change?

Added and refactored some existing ones

Release notes

⚠️ Breaking Change

We have updated configuration values embedding metrics under monitoring. In case you
have disabled metrics in your helm release values previously by setting:

metrics:
  enabled: false

You should update it to have he same behavior

monitoring:
  enabled: false

Please review monitoring for more info.

Documentation Changes

weaveworks/weave-gitops#4089

Other follow ups

Not identified

@enekofb enekofb requested review from squaremo, bigkevmcd and a team October 17, 2023 14:49
@enekofb
Copy link
Contributor Author

enekofb commented Oct 17, 2023

@squaremo @bigkevmcd just adding your for visibility on the change. Feel free to review on marked as ready if you want but not needed.

@enekofb enekofb added the enhancement New feature or request label Oct 17, 2023
@enekofb enekofb changed the title do not expose profiling debug endpoint add internal monitoring server for metrics and profiling telemetry Oct 17, 2023
@enekofb enekofb changed the title add internal monitoring server for metrics and profiling telemetry add internal monitoring server for metrics and profiling Oct 17, 2023
@enekofb enekofb marked this pull request as ready for review October 17, 2023 16:31
cmd/clusters-service/app/server.go Outdated Show resolved Hide resolved
Enabled bool
}

func NewDefaultPprofHandler() (string, http.Handler) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exported, so should have a doc comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added 👍

pkg/monitoring/server.go Show resolved Hide resolved

// NewSever creates and starts a management server for all endpoints that we need to expose internally. For example metrics or profiling.
func NewServer(opts Options) (*http.Server, error) {
if !opts.Enabled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this should be moved outside, so we only create the server if it's enabled?

Copy link
Contributor Author

@enekofb enekofb Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, it was odd. the proper check was put in the client side. removed here

// management server
var monitoringServer *http.Server
args.Log.Info("i am in server monitoring", "monitoring", args.MonitoringOptions)
if args.MonitoringOptions.Enabled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below, maybe we shouldn't be creating the server if it's not enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is covered by the previous comment. let me know in case not.

pkg/monitoring/server.go Outdated Show resolved Hide resolved
)

func TestNewServer(t *testing.T) {
require := require.New(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is right, you're calling .Error inside a sub-test which will trigger a subtest error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment: could you expand a bit more. My idea here is to have a test case for unhappy path, in this case where a client tries to create a monitoring server without a valid address.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's fixed now?

Essentially, failing a *testing.T from a parent test will cause the subtest to fail with an error.

package main

import (
	"testing"

	"github.com/stretchr/testify/require"
)

func TestHelloWorld(t *testing.T) {
	require := require.New(t)
	value := "testing errors"

	t.Run("bad subtest", func(t *testing.T) {
		require.Equal(value, "test")
	})

	t.Run("this subtest", func(t *testing.T) {
		require.Equal(value, "another test")
	})
}
$ go test .
--- FAIL: TestHelloWorld (0.00s)
    subtest_test.go:14:
        	Error Trace:	/Users/kevin/Source/Go/eneko/subtest_test.go:14
        	Error:      	Not equal:
        	            	expected: "testing errors"
        	            	actual  : "test"

        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-testing errors
        	            	+test
        	Test:       	TestHelloWorld
    --- FAIL: TestHelloWorld/bad_subtest (0.00s)
        testing.go:1490: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
FAIL
FAIL	subtest-demo	0.176s
FAIL

Copy link
Contributor Author

@enekofb enekofb Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh yes, I see got it now ... that require was not supposed to be there. thanks for the explanation 🙏

- name: http-metrics
containerPort: {{ .Values.metrics.service.port }}
{{- if .Values.monitoring.enabled }}
- name: {{ .Values.monitoring.service.name }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

{{- if .Values.metrics.enabled }}
- port: {{ .Values.metrics.service.port }}
targetPort: {{ .Values.metrics.service.port }}
{{- if .Values.monitoring.enabled }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

cmd/clusters-service/app/options.go Outdated Show resolved Hide resolved
pkg/monitoring/metrics/prometheus_test.go Outdated Show resolved Hide resolved
pkg/monitoring/profiling/pprof_test.go Outdated Show resolved Hide resolved
@enekofb
Copy link
Contributor Author

enekofb commented Oct 18, 2023

Added also docs to the PR and here weaveworks/weave-gitops#4089

@enekofb enekofb added the breaking-change to flag breaking changes similar to weave gitops oss label Oct 18, 2023
@enekofb
Copy link
Contributor Author

enekofb commented Oct 19, 2023

@bigkevmcd please let me know whether request for changes still applies or I could go ahead merging?

@enekofb enekofb merged commit 9c5f9ec into main Oct 19, 2023
10 checks passed
@enekofb enekofb deleted the issues/3228 branch October 19, 2023 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change to flag breaking changes similar to weave gitops oss enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

release path for pprof api endpoint
3 participants