-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export health metrics #3432
Export health metrics #3432
Conversation
6f0eb5a
to
373d128
Compare
libbeat/beat/beat.go
Outdated
@@ -303,3 +317,20 @@ func handleError(err error) error { | |||
func GetDefaultVersion() string { | |||
return defaultBeatVersion | |||
} | |||
|
|||
func startCollectingMetrics() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove this and replace it with a method call every time the metrics are needed. So no need to define an interval which will lead to the problem, that metrics are not up-to-date when they are fetched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Steffen added functionality for calculating the metrics when they are fetched #3510, and this PR is updated to use this functionality.
3d25faf
to
01e6a0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @monicasarbu Could you rebase and fix the system test for logging?
b47ff4d
to
e539ae8
Compare
@urso I have updated the PR to use the |
@@ -136,6 +136,8 @@ func NewClient( | |||
Write: statWriteBytes, | |||
ReadErrors: statReadErrors, | |||
WriteErrors: statWriteErrors, | |||
SendBytes: outputs.SendBytes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be SentBytes
?
libbeat/beat/metrics.go
Outdated
V.OnRegistryStart() | ||
defer V.OnRegistryFinished() | ||
|
||
monitoring.ReportInt(V, "memory.total", int64(stats.TotalAlloc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when using 'ReportXXX', the names should not include .
, as .
is used to split namespaces, but having the collector do the splitting makes things quite more complicated in the collector. Instead namespaces should be reported only once (without .
).
For example:
type memstatsVar struct{}
var metrics = monitoring.Default.NewRegistry("beat")
func init() {
metrics.Add("memstats", memstatsVar{}, monitoring.Reported)
}
func (memstatsVar) Visit(m monitoring.Mode, V monitoring.Visitor) {
var stats runtime.MemStats
runtime.ReadMemStats(&stats)
V.OnRegistryStart()
defer V.OnRegistryFinished()
reportNamespace(V, "memory", func() {
monitoring.ReportInt(V, "total", int64(stats.TotalAlloc))
if m == monitoring.Full {
monitoring.ReportInt(V, "alloc", int64(stats.Alloc))
}
})
if m == monitoring.Full {
reportNamespace(V, "gc", func() {
monitoring.ReportInt(V, "next", int64(stats.NextGC))
})
}
}
func reportNamespace(V monitoring.Visitor, name string, f func()) {
V.OnKey(name)
V.OnRegistryStart()
defer V.OnRegistryFinished()
f()
}
0ba6c16
to
f394737
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments for the naming as I think now is a good time to clean it up as we touch these variables. Besides that LGTM.
eventsNotAcked = monitoring.NewInt(outputs.Metrics, "es.published_but_not_acked_events") | ||
publishEventsCallCount = monitoring.NewInt(outputs.Metrics, "es.call_count.PublishEvents") | ||
|
||
statReadBytes = monitoring.NewInt(outputs.Metrics, "es.publish.read_bytes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we go here with the metricbeat convention? So this would be:
es.publish.read.bytes
es.publish.read.errors
es.publish.write.bytes
es.publish.write.errors
libbeat/outputs/kafka/client.go
Outdated
ackedEvents = expvar.NewInt("libbeat.kafka.published_and_acked_events") | ||
eventsNotAcked = expvar.NewInt("libbeat.kafka.published_but_not_acked_events") | ||
publishEventsCallCount = expvar.NewInt("libbeat.kafka.call_count.PublishEvents") | ||
ackedEvents = monitoring.NewInt(outputs.Metrics, "kafka.published_and_acked_events") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kafka.published.events.acked
kafka.published.events.not_acked
The last one is kind of a strange one as it counts internal function calls. Do we need this metric or what does it say exactly? Perhaps kafka.published.events.call
?
libbeat/outputs/logstash/logstash.go
Outdated
statWriteBytes = expvar.NewInt("libbeat.logstash.publish.write_bytes") | ||
statReadErrors = expvar.NewInt("libbeat.logstash.publish.read_errors") | ||
statWriteErrors = expvar.NewInt("libbeat.logstash.publish.write_errors") | ||
ackedEvents = monitoring.NewInt(outputs.Metrics, "logstash.published_and_acked_events") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logstash.published.events.acked
logstash.published.events.not_acked
logstash.publish.read.bytes
logstash.publish.read.errors
logstash.publish.write.bytes
logstash.publish.write.errors
libbeat/outputs/mode/mode.go
Outdated
"github.com/elastic/beats/libbeat/outputs" | ||
) | ||
|
||
// Metrics that can retrieved through the expvar web interface. | ||
var ( | ||
messagesDropped = expvar.NewInt("libbeat.outputs.messages_dropped") | ||
messagesDropped = monitoring.NewInt(outputs.Metrics, "messages_dropped") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
messages.dropped
libbeat/outputs/outputs.go
Outdated
var ( | ||
Metrics = monitoring.Default.NewRegistry("outputs") | ||
|
||
AckedEvents = monitoring.NewInt(Metrics, "acked_events", monitoring.Report) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
events.acked
sent.bytes
@@ -19,6 +20,11 @@ var ( | |||
versionRegex = regexp.MustCompile(`redis_version:(\d+).(\d+)`) | |||
) | |||
|
|||
var ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redis.published.events.acked
redis.published.events.not_acked
libbeat/outputs/redis/redis.go
Outdated
statWriteBytes = expvar.NewInt("libbeat.redis.publish.write_bytes") | ||
statReadErrors = expvar.NewInt("libbeat.redis.publish.read_errors") | ||
statWriteErrors = expvar.NewInt("libbeat.redis.publish.write_errors") | ||
statReadBytes = monitoring.NewInt(outputs.Metrics, "redis.publish.read_bytes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redis.publish.read.bytes
redis.publish.read.errors
redis.publish.write.bytes
redis.publish.write.errors
8d6c991
to
c75dd9b
Compare
@ruflin I used this option to rename all the metrics that we export to have a similar format with the one Beats export. |
libbeat/publisher/client.go
Outdated
@@ -13,7 +13,7 @@ import ( | |||
|
|||
// Metrics that can retrieved through the expvar web interface. | |||
var ( | |||
publishedEvents = expvar.NewInt("libbeat.publisher.published_events") | |||
publishedEvents = expvar.NewInt("publisher.events") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call it publisher.events.count
? Otherwise we can't add any other values in the future to publisher.events
as it would be a key and a namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruflin Good point, I did the change.
f9abefc
to
fbbc49a
Compare
Also rename all health metrics to have the same format with the metrics exported by Metricbeat
16331e5
to
5327c49
Compare
Part of #3422
Exports the following health metrics over expvar:
Sum of all outputs:
Per output:
Redis:
Elasticsearch:
Kafka:
Logstash:
Limitations: For Kafka, the Beats are not exporting
output.kafka.write_bytes
andoutput.kafka.write_errors
. So, for Kafka we cannot calculatewrite.bytes
andwrite.errors
, and the values ofoutput.write.bytes
andoutput.write.errors
don't include Kafka output.