Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export health metrics #3432

Merged
merged 1 commit into from
Feb 15, 2017
Merged

Conversation

monicasarbu
Copy link
Contributor

@monicasarbu monicasarbu commented Jan 20, 2017

Part of #3422

Exports the following health metrics over expvar:

Sum of all outputs:

  • output.write.errors
  • output.write.bytes
  • output.events.acked
  • publisher.events.count
  • publisher.queue.messages.count
  • output.messages.dropped
  • beat.memstats.memory_total
  • beat.memstats.memory_alloc (bytes allocated and not yet freed)
  • beat.memstats.gc_next (next collection will happen when HeapAlloc ≥ this amount)

Per output:

Redis:

  • output.redis.events.acked
  • output.redis.events.not_acked
  • output.redis.read.errors
  • output.redis.read.bytes
  • output.redis.write.bytes
  • output.redis.write.errors

Elasticsearch:

  • output.elasticsearch.events.acked
  • output.elasticsearch.events.not_acked
  • output.elasticsearch.publishEvents.call.count
  • output.elasticsearch.read.bytes
  • output.elasticsearch.read.errors
  • output.elasticsearch.write.bytes
  • output.elasticsearch.write.errors

Kafka:

  • output.kafka.events.acked
  • output.kafka.events.not_acked
  • output.kafka.publishEvents.call.count
  • output.kafka.read.errors (cannot be calculated)
  • output.kafka.read.bytes (cannot be calculated)
  • output.kafka.write.bytes (cannot be calculated)
  • output.kafka.write.errors (cannot be calculated)

Logstash:

  • output.logstash.events.acked
  • output.logstash.events.not_acked
  • output.logstash.publishEvents.call.count
  • output.logstash.read.errors
  • output.logstash.read.bytes
  • output.logstash.write.bytes
  • output.logstash.write.errors

Limitations: For Kafka, the Beats are not exporting output.kafka.write_bytes and output.kafka.write_errors. So, for Kafka we cannot calculate write.bytes and write.errors, and the values of output.write.bytes and output.write.errors don't include Kafka output.

@monicasarbu monicasarbu added the in progress Pull request is currently in progress. label Jan 20, 2017
@monicasarbu monicasarbu force-pushed the export_health_metrics branch from 6f0eb5a to 373d128 Compare January 29, 2017 18:07
@@ -303,3 +317,20 @@ func handleError(err error) error {
func GetDefaultVersion() string {
return defaultBeatVersion
}

func startCollectingMetrics() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove this and replace it with a method call every time the metrics are needed. So no need to define an interval which will lead to the problem, that metrics are not up-to-date when they are fetched.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steffen added functionality for calculating the metrics when they are fetched #3510, and this PR is updated to use this functionality.

@monicasarbu monicasarbu force-pushed the export_health_metrics branch 2 times, most recently from 3d25faf to 01e6a0b Compare February 2, 2017 13:25
Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @monicasarbu Could you rebase and fix the system test for logging?

@monicasarbu monicasarbu force-pushed the export_health_metrics branch 2 times, most recently from b47ff4d to e539ae8 Compare February 3, 2017 11:50
@monicasarbu monicasarbu removed the in progress Pull request is currently in progress. label Feb 3, 2017
@monicasarbu
Copy link
Contributor Author

monicasarbu commented Feb 3, 2017

@urso I have updated the PR to use the libbeat.monitoring for exporting the metrics. Please let me know what you think.

@@ -136,6 +136,8 @@ func NewClient(
Write: statWriteBytes,
ReadErrors: statReadErrors,
WriteErrors: statWriteErrors,
SendBytes: outputs.SendBytes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be SentBytes?

V.OnRegistryStart()
defer V.OnRegistryFinished()

monitoring.ReportInt(V, "memory.total", int64(stats.TotalAlloc))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when using 'ReportXXX', the names should not include ., as . is used to split namespaces, but having the collector do the splitting makes things quite more complicated in the collector. Instead namespaces should be reported only once (without .).

For example:

type memstatsVar struct{}

var metrics = monitoring.Default.NewRegistry("beat")

func init() {
	metrics.Add("memstats", memstatsVar{}, monitoring.Reported)
}

func (memstatsVar) Visit(m monitoring.Mode, V monitoring.Visitor) {
	var stats runtime.MemStats
	runtime.ReadMemStats(&stats)

	V.OnRegistryStart()
	defer V.OnRegistryFinished()

	reportNamespace(V, "memory", func() {
		monitoring.ReportInt(V, "total", int64(stats.TotalAlloc))
		if m == monitoring.Full {
			monitoring.ReportInt(V, "alloc", int64(stats.Alloc))
		}
	})

	if m == monitoring.Full {
		reportNamespace(V, "gc", func() {
			monitoring.ReportInt(V, "next", int64(stats.NextGC))
		})
	}
}

func reportNamespace(V monitoring.Visitor, name string, f func()) {
	V.OnKey(name)
	V.OnRegistryStart()
	defer V.OnRegistryFinished()
	f()
}

@monicasarbu monicasarbu force-pushed the export_health_metrics branch 4 times, most recently from 0ba6c16 to f394737 Compare February 8, 2017 22:00
Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments for the naming as I think now is a good time to clean it up as we touch these variables. Besides that LGTM.

eventsNotAcked = monitoring.NewInt(outputs.Metrics, "es.published_but_not_acked_events")
publishEventsCallCount = monitoring.NewInt(outputs.Metrics, "es.call_count.PublishEvents")

statReadBytes = monitoring.NewInt(outputs.Metrics, "es.publish.read_bytes")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we go here with the metricbeat convention? So this would be:

es.publish.read.bytes
es.publish.read.errors
es.publish.write.bytes
es.publish.write.errors

ackedEvents = expvar.NewInt("libbeat.kafka.published_and_acked_events")
eventsNotAcked = expvar.NewInt("libbeat.kafka.published_but_not_acked_events")
publishEventsCallCount = expvar.NewInt("libbeat.kafka.call_count.PublishEvents")
ackedEvents = monitoring.NewInt(outputs.Metrics, "kafka.published_and_acked_events")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kafka.published.events.acked
kafka.published.events.not_acked

The last one is kind of a strange one as it counts internal function calls. Do we need this metric or what does it say exactly? Perhaps kafka.published.events.call?

statWriteBytes = expvar.NewInt("libbeat.logstash.publish.write_bytes")
statReadErrors = expvar.NewInt("libbeat.logstash.publish.read_errors")
statWriteErrors = expvar.NewInt("libbeat.logstash.publish.write_errors")
ackedEvents = monitoring.NewInt(outputs.Metrics, "logstash.published_and_acked_events")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logstash.published.events.acked
logstash.published.events.not_acked
logstash.publish.read.bytes
logstash.publish.read.errors
logstash.publish.write.bytes
logstash.publish.write.errors

"github.com/elastic/beats/libbeat/outputs"
)

// Metrics that can retrieved through the expvar web interface.
var (
messagesDropped = expvar.NewInt("libbeat.outputs.messages_dropped")
messagesDropped = monitoring.NewInt(outputs.Metrics, "messages_dropped")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

messages.dropped

var (
Metrics = monitoring.Default.NewRegistry("outputs")

AckedEvents = monitoring.NewInt(Metrics, "acked_events", monitoring.Report)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

events.acked
sent.bytes

@@ -19,6 +20,11 @@ var (
versionRegex = regexp.MustCompile(`redis_version:(\d+).(\d+)`)
)

var (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redis.published.events.acked
redis.published.events.not_acked

statWriteBytes = expvar.NewInt("libbeat.redis.publish.write_bytes")
statReadErrors = expvar.NewInt("libbeat.redis.publish.read_errors")
statWriteErrors = expvar.NewInt("libbeat.redis.publish.write_errors")
statReadBytes = monitoring.NewInt(outputs.Metrics, "redis.publish.read_bytes")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redis.publish.read.bytes
redis.publish.read.errors
redis.publish.write.bytes
redis.publish.write.errors

@monicasarbu monicasarbu force-pushed the export_health_metrics branch from 8d6c991 to c75dd9b Compare February 9, 2017 15:35
@monicasarbu
Copy link
Contributor Author

@ruflin I used this option to rename all the metrics that we export to have a similar format with the one Beats export.

@@ -13,7 +13,7 @@ import (

// Metrics that can retrieved through the expvar web interface.
var (
publishedEvents = expvar.NewInt("libbeat.publisher.published_events")
publishedEvents = expvar.NewInt("publisher.events")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call it publisher.events.count? Otherwise we can't add any other values in the future to publisher.events as it would be a key and a namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin Good point, I did the change.

@monicasarbu monicasarbu force-pushed the export_health_metrics branch 2 times, most recently from f9abefc to fbbc49a Compare February 9, 2017 19:34
Also rename all health metrics to have the same format with the metrics exported by Metricbeat
@monicasarbu monicasarbu force-pushed the export_health_metrics branch from 16331e5 to 5327c49 Compare February 10, 2017 11:45
@urso urso merged commit 44018b3 into elastic:master Feb 15, 2017
@monicasarbu monicasarbu deleted the export_health_metrics branch February 16, 2017 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants