Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[agent] Add metrics to show connections status between agent and collectors #2657

Merged
merged 34 commits into from
Jan 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
b712c9d
add metrics that show agent connection collector status
Nov 26, 2020
109617c
update comment
WalkerWang731 Nov 26, 2020
eee31fd
exec make fmt
WalkerWang731 Nov 26, 2020
82ed46e
simplify function and add testing relevant code in the builder_test.go
WalkerWang731 Nov 28, 2020
ff406aa
add comment in connect_metrics.go
WalkerWang731 Nov 28, 2020
c2a64f4
simplify code and changed use expvar to show target
WalkerWang731 Nov 30, 2020
1429e76
simplify code and changed use expvar to show target
WalkerWang731 Nov 30, 2020
dd68627
exec make fmt
WalkerWang731 Nov 30, 2020
83452e4
Fix collector panic due to sarama sdk returning nil error (#2654)
Betula-L Nov 26, 2020
68c0b29
Fix flaky tbuffered server test (#2635)
pkositsyn Nov 27, 2020
f651162
Add github actions for integration tests (#2649)
Ashmita152 Nov 27, 2020
8ee024e
Clean-up GH action names (#2661)
yurishkuro Nov 27, 2020
111efb7
Fix for failures in badger integration tests (#2660)
Ashmita152 Nov 27, 2020
e4da609
Add protogen validation test (#2662)
Ashmita152 Nov 27, 2020
9ad6033
Add github action for jaeger all-in-one image (#2663)
Ashmita152 Nov 29, 2020
faa61ac
Update comment that looks confusing during builds
yurishkuro Nov 29, 2020
71755cc
Use GitHub actions based build badges
yurishkuro Nov 29, 2020
0274fcf
Fix and minor improvements to all-in-one github action (#2667)
Ashmita152 Nov 30, 2020
c222b0d
Fix docker login issue with all-in-one build (#2668)
Ashmita152 Nov 30, 2020
01d1e9c
Fix issue with all-in-one build (#2669)
Ashmita152 Nov 30, 2020
fb1f57a
Update cmd/agent/app/reporter/connect_metrics.go
WalkerWang731 Nov 30, 2020
ba18453
Update cmd/agent/app/reporter/connect_metrics.go
WalkerWang731 Nov 30, 2020
68e6d76
simplify the code that remove ConnectMetricsParams{} and integrate Co…
WalkerWang731 Nov 30, 2020
ac84830
simplify the code that remove ConnectMetricsParams{} and integrate Co…
WalkerWang731 Nov 30, 2020
767623e
Merge branch 'master' into master
WalkerWang731 Nov 30, 2020
a1ff642
merage from the lastest master branch and exec make fmt
Nov 30, 2020
d07f8b8
add comment on ConnectMetrics
WalkerWang731 Nov 30, 2020
32c1e9b
Merge branch 'master' into master
WalkerWang731 Nov 30, 2020
512e61e
Merge branch 'master' into master
WalkerWang731 Dec 1, 2020
1222f7c
Merge branch 'master' into master
WalkerWang731 Dec 2, 2020
58113f7
Merge branch 'master' into master
WalkerWang731 Dec 2, 2020
a75e032
Merge branch 'master' into master
WalkerWang731 Dec 3, 2020
44999f0
Merge branch 'master' into master
WalkerWang731 Jan 13, 2021
d00801e
clear up redundant codes
WalkerWang731 Jan 13, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions cmd/agent/app/reporter/connect_metrics.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
// Copyright (c) 2020 The Jaeger Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package reporter

import (
"github.com/uber/jaeger-lib/metrics"
"go.uber.org/zap"
)

type connectMetrics struct {
// used for reflect current connection stability
Reconnects metrics.Counter `metric:"collector_reconnects" help:"Number of successful connections (including reconnects) to the collector."`

// Connection status that jaeger-agent to jaeger-collector, 1 is connected, 0 is disconnected
Status metrics.Gauge `metric:"collector_connected" help:"Status of connection between the agent and the collector; 1 is connected, 0 is disconnected"`
}

// ConnectMetrics include connectMetrics necessary params if want to modify metrics of connectMetrics, must via ConnectMetrics API
type ConnectMetrics struct {
Logger *zap.Logger // required
MetricsFactory metrics.Factory // required
connectMetrics *connectMetrics
}

// NewConnectMetrics will be initialize ConnectMetrics
func (r *ConnectMetrics) NewConnectMetrics() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unusual construct. I would make it simply a constructor function

func NewConnectMetrics(logger *zap.Logger, metrics metrics.Factory) *ConnectMetrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry for responding too late..

I saw that your PR #2728 and thanks for your help clean-up.

Looked forward to learning more from you

r.connectMetrics = new(connectMetrics)
r.MetricsFactory = r.MetricsFactory.Namespace(metrics.NSOptions{Name: "connection_status"})
metrics.MustInit(r.connectMetrics, r.MetricsFactory, nil)
}

// OnConnectionStatusChange used for pass the status parameter when connection is changed
// 0 is disconnected, 1 is connected
// For quick view status via use `sum(jaeger_agent_connection_status_collector_connected{}) by (instance) > bool 0`
func (r *ConnectMetrics) OnConnectionStatusChange(connected bool) {
if connected {
r.connectMetrics.Status.Update(1)
r.connectMetrics.Reconnects.Inc(1)
} else {
r.connectMetrics.Status.Update(0)
}
}
81 changes: 81 additions & 0 deletions cmd/agent/app/reporter/connect_metrics_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
// Copyright (c) 2020 The Jaeger Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package reporter

import (
"testing"
"time"

"github.com/stretchr/testify/assert"
"github.com/uber/jaeger-lib/metrics/metricstest"
)

type connectMetricsTest struct {
mf *metricstest.Factory
}

func testConnectMetrics(fn func(tr *connectMetricsTest, r *ConnectMetrics)) {
testConnectMetricsWithParams(&ConnectMetrics{}, fn)
}

func testConnectMetricsWithParams(cm *ConnectMetrics, fn func(tr *connectMetricsTest, r *ConnectMetrics)) {
mf := metricstest.NewFactory(time.Hour)
cm.MetricsFactory = mf
cm.NewConnectMetrics()

tr := &connectMetricsTest{
mf: mf,
}

fn(tr, cm)
}

func testCollectorConnected(r *ConnectMetrics) {
r.OnConnectionStatusChange(true)
}

func testCollectorAborted(r *ConnectMetrics) {
r.OnConnectionStatusChange(false)
}

func TestConnectMetrics(t *testing.T) {

testConnectMetrics(func(tr *connectMetricsTest, r *ConnectMetrics) {
getGauge := func() map[string]int64 {
_, gauges := tr.mf.Snapshot()
return gauges
}

getCount := func() map[string]int64 {
counts, _ := tr.mf.Snapshot()
return counts
}

// testing connect aborted
testCollectorAborted(r)
assert.EqualValues(t, 0, getGauge()["connection_status.collector_connected"])

// testing connect connected
testCollectorConnected(r)
assert.EqualValues(t, 1, getGauge()["connection_status.collector_connected"])
assert.EqualValues(t, 1, getCount()["connection_status.collector_reconnects"])

// testing reconnect counts
testCollectorAborted(r)
testCollectorConnected(r)
assert.EqualValues(t, 2, getCount()["connection_status.collector_reconnects"])

})
}
39 changes: 35 additions & 4 deletions cmd/agent/app/reporter/grpc/builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,20 @@ package grpc
import (
"context"
"errors"
"expvar"
"fmt"
"strings"

grpc_retry "github.com/grpc-ecosystem/go-grpc-middleware/retry"
"github.com/grpc-ecosystem/go-grpc-middleware/retry"
"github.com/uber/jaeger-lib/metrics"
"go.uber.org/zap"
"google.golang.org/grpc"
"google.golang.org/grpc/connectivity"
"google.golang.org/grpc/credentials"
"google.golang.org/grpc/resolver"
"google.golang.org/grpc/resolver/manual"

"github.com/jaegertracing/jaeger/cmd/agent/app/reporter"
"github.com/jaegertracing/jaeger/pkg/config/tlscfg"
"github.com/jaegertracing/jaeger/pkg/discovery"
"github.com/jaegertracing/jaeger/pkg/discovery/grpcresolver"
Expand All @@ -43,6 +47,9 @@ type ConnBuilder struct {
DiscoveryMinPeers int
Notifier discovery.Notifier
Discoverer discovery.Discoverer

// for unit test and provide ConnectMetrics and outside call
ConnectMetrics *reporter.ConnectMetrics
}

// NewConnBuilder creates a new grpc connection builder.
Expand All @@ -51,7 +58,7 @@ func NewConnBuilder() *ConnBuilder {
}

// CreateConnection creates the gRPC connection
func (b *ConnBuilder) CreateConnection(logger *zap.Logger) (*grpc.ClientConn, error) {
func (b *ConnBuilder) CreateConnection(logger *zap.Logger, mFactory metrics.Factory) (*grpc.ClientConn, error) {
var dialOptions []grpc.DialOption
var dialTarget string
if b.TLS.Enabled { // user requested a secure connection
Expand Down Expand Up @@ -97,14 +104,38 @@ func (b *ConnBuilder) CreateConnection(logger *zap.Logger) (*grpc.ClientConn, er
return nil, err
}

go func(cc *grpc.ClientConn) {
if b.ConnectMetrics == nil {
cm := reporter.ConnectMetrics{
Logger: logger,
MetricsFactory: mFactory.Namespace(metrics.NSOptions{Name: "", Tags: map[string]string{"protocol": "grpc"}}),
}
cm.NewConnectMetrics()
b.ConnectMetrics = &cm
}

go func(cc *grpc.ClientConn, cm *reporter.ConnectMetrics) {
logger.Info("Checking connection to collector")
var egt *expvar.String
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not move this into ConnectMetrics struct as well?

r := expvar.Get("gRPCTarget")
if r == nil {
Comment on lines +119 to +120
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r := expvar.Get("gRPCTarget")
if r == nil {
if r := expvar.Get("gRPCTarget"); r == nil {

egt = expvar.NewString("gRPCTarget")
} else {
egt = r.(*expvar.String)
}

for {
s := cc.GetState()
if s == connectivity.Ready {
cm.OnConnectionStatusChange(true)
egt.Set(cc.Target())
} else {
cm.OnConnectionStatusChange(false)
}

logger.Info("Agent collector connection state change", zap.String("dialTarget", dialTarget), zap.Stringer("status", s))
cc.WaitForStateChange(context.Background(), s)
}
}(conn)
}(conn, b.ConnectMetrics)

return conn, nil
}
4 changes: 2 additions & 2 deletions cmd/agent/app/reporter/grpc/builder_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ func TestBuilderFromConfig(t *testing.T) {
t,
[]string{"127.0.0.1:14268", "127.0.0.1:14269"},
cfg.CollectorHostPorts)
r, err := cfg.CreateConnection(zap.NewNop())
r, err := cfg.CreateConnection(zap.NewNop(), metrics.NullFactory)
require.NoError(t, err)
assert.NotNil(t, r)
}
Expand Down Expand Up @@ -149,7 +149,7 @@ func TestBuilderWithCollectors(t *testing.T) {
cfg.Notifier = test.notifier
cfg.Discoverer = test.discoverer

conn, err := cfg.CreateConnection(zap.NewNop())
conn, err := cfg.CreateConnection(zap.NewNop(), metrics.NullFactory)
yurishkuro marked this conversation as resolved.
Show resolved Hide resolved
if test.expectedError == "" {
require.NoError(t, err)
require.NotNil(t, conn)
Expand Down
2 changes: 1 addition & 1 deletion cmd/agent/app/reporter/grpc/collector_proxy.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ type ProxyBuilder struct {

// NewCollectorProxy creates ProxyBuilder
func NewCollectorProxy(builder *ConnBuilder, agentTags map[string]string, mFactory metrics.Factory, logger *zap.Logger) (*ProxyBuilder, error) {
conn, err := builder.CreateConnection(logger)
conn, err := builder.CreateConnection(logger, mFactory)
if err != nil {
return nil, err
}
Expand Down