Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log non-zero expvars periodically #1955

Merged
merged 1 commit into from
Jul 5, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ https://github.com/elastic/beats/compare/v5.0.0-alpha3...master[Check the HEAD d

*Affecting all Beats*

- Periodically log internal metrics. {pull}1955[1955]

*Metricbeat*

*Packetbeat*
Expand Down
8 changes: 8 additions & 0 deletions filebeat/filebeat.full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -693,6 +693,14 @@ output.elasticsearch:
# Send all logging output to syslog. The default is false.
#logging.to_syslog: true

# If enabled, filebeat periodically logs its internal metrics that have changed
# in the last period. For each metric that changed, the delta from the value at
# the beginning of the period is logged. The default is true.
#logging.metrics.enabled: true

# The period after which to log the internal metrics. The default is 30s.
#logging.metrics.period: 30s

# Logging to rotating files files. Set logging.to_files to false to disable logging to
# files.
logging.to_files: true
Expand Down
8 changes: 8 additions & 0 deletions libbeat/_meta/config.full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -479,6 +479,14 @@ output.elasticsearch:
# Send all logging output to syslog. The default is false.
#logging.to_syslog: true

# If enabled, beatname periodically logs its internal metrics that have changed
# in the last period. For each metric that changed, the delta from the value at
# the beginning of the period is logged. The default is true.
#logging.metrics.enabled: true

# The period after which to log the internal metrics. The default is 30s.
#logging.metrics.period: 30s

# Logging to rotating files files. Set logging.to_files to false to disable logging to
# files.
logging.to_files: true
Expand Down
21 changes: 21 additions & 0 deletions libbeat/docs/loggingconfig.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,27 @@ all the debug messages related to event publishing. When starting the Beat,
selectors can be overwritten using the `-d` command line option (`-d` also sets
the debug log level).

===== metrics.enabled

If enabled, {beatname_uc} periodically logs its internal metrics that have
changed in the last period. For each metric that changed, the delta from the
value at the beginning of the period is logged. The default is true.

Here is an example log line:

[source,shell]
----------------------------------------------------------------------------------------------------------------------------------------------------
2016/07/04 12:16:10.821047 logp.go:194: INFO Non-zero metrics in the last 30s: libbeatEsPublishedAndAckedEvents=18 libbeatEsPublishWriteBytes=10045
----------------------------------------------------------------------------------------------------------------------------------------------------

Note that we currently offer no backwards compatible guarantees for the internal
metrics and for this reason they are also not documented.


===== metrics.period

The period after which to log the internal metrics. The default is 30s.

===== files.path

The directory that log files are written to. The default is the logs path. See the
Expand Down
69 changes: 69 additions & 0 deletions libbeat/logp/logp.go
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
package logp

import (
"expvar"
"flag"
"fmt"
"io/ioutil"
"log"
"strconv"
"strings"
"time"

"github.com/elastic/beats/libbeat/paths"
)
Expand All @@ -21,8 +24,18 @@ type Logging struct {
ToSyslog *bool `config:"to_syslog"`
ToFiles *bool `config:"to_files"`
Level string
Metrics LoggingMetricsConfig `config:"metrics"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Should we enable this by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is enabled by default with a 30s interval.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see it on line 170. Would probably make sense to also use a ucfg defaultConfig.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsg For the above: I didn't mean as part of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried initially but didn't work out due to circular imports (cfgfile imports logp).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-( @urso interesting issue.

}

type LoggingMetricsConfig struct {
Enabled *bool `config:"enabled"`
Period *time.Duration `config:"period" validate:"nonzero,min=0s"`
}

var (
defaultMetricsPeriod = 30 * time.Second
)

func init() {
// Adds logging specific flags: -v, -e and -d.
verbose = flag.Bool("v", false, "Log at INFO level")
Expand Down Expand Up @@ -112,6 +125,8 @@ func Init(name string, config *Logging) error {
log.SetOutput(ioutil.Discard)
}

go logExpvars(&config.Metrics)

return nil
}

Expand Down Expand Up @@ -141,3 +156,57 @@ func getLogLevel(config *Logging) (Priority, error) {
}
return level, nil
}

// snapshotExpvars iterates through all the defined expvars, and for the top
// level vars that are integers it snapshots the name and value in a separate
// map.
func snapshotExpvars(varsMap map[string]int64) {
expvar.Do(func(kv expvar.KeyValue) {
switch kv.Value.(type) {
case *expvar.Int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR covers the most important metrics for the time being so no action is required at the moment IMO. We can improve this when we add "metrics" within beats. We have some expvar metrics that are nested within a map that won't be printed by this. For example, in Metricbeat we have a fetches.mysql-info.failures metric that is nested two levels deep in maps.

varsMap[kv.Key], _ = strconv.ParseInt(kv.Value.String(), 10, 64)
}
})
}

// buildMetricsOutput makes the delta between vals and prevVals and builds
// a printable string with the non-zero deltas.
func buildMetricsOutput(prevVals map[string]int64, vals map[string]int64) string {
metrics := ""
for k, v := range vals {
delta := v - prevVals[k]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the libbeatXYZ metrics that are logged by this are counters, but we do have some gauges (like cache size in Winlogbeat) where reporting the delta won't make much sense. But those gauges won't be logged because they are nested inside a map so no action is required at the moment.

if delta != 0 {
metrics = fmt.Sprintf("%s %s=%d", metrics, k, delta)
}
}
return metrics
}

// logExpvars logs at Info level the integer expvars that have changed in the
// last interval. For each expvar, the delta from the beginning of the interval
// is logged.
func logExpvars(metricsCfg *LoggingMetricsConfig) {
if metricsCfg.Enabled != nil && *metricsCfg.Enabled == false {
Info("Metrics logging disabled")
return
}
if metricsCfg.Period == nil {
metricsCfg.Period = &defaultMetricsPeriod
}
Info("Metrics logging every %s", metricsCfg.Period)

ticker := time.NewTicker(*metricsCfg.Period)
prevVals := map[string]int64{}
for {
<-ticker.C
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though it is running in a independent go routine, should we still make sure it is shut down properly when the beat is stopped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but I struggle with the API for it. We'd need to have a Stop() function for the logging system, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple version could be just passing the done channel to the go routine and using a select statement. But that would require to move the start of the goroutine into the beat starting process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, but then it's not self contained in logp. Maybe it doesn't need to be.

vals := map[string]int64{}
snapshotExpvars(vals)
metrics := buildMetricsOutput(prevVals, vals)
prevVals = vals
if len(metrics) > 0 {
Info("Non-zero metrics in the last %s:%s", metricsCfg.Period, metrics)
} else {
Info("No non-zero metrics in the last %s", metricsCfg.Period)
}
}
}
57 changes: 57 additions & 0 deletions libbeat/logp/logp_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
// +build !integration

package logp

import (
"expvar"
"testing"

"github.com/stretchr/testify/assert"
)

func TestSnapshotExpvars(t *testing.T) {
test := expvar.NewInt("test")
test.Add(42)

vals := map[string]int64{}
snapshotExpvars(vals)

assert.Equal(t, vals["test"], int64(42))
}

func TestBuildMetricsOutput(t *testing.T) {
test := expvar.NewInt("testLog")
test.Add(1)

prevVals := map[string]int64{}
snapshotExpvars(prevVals)

test.Add(5)

vals := map[string]int64{}
snapshotExpvars(vals)

metrics := buildMetricsOutput(prevVals, vals)
assert.Equal(t, " testLog=5", metrics)
prevVals = vals

test.Add(3)
vals = map[string]int64{}
snapshotExpvars(vals)
metrics = buildMetricsOutput(prevVals, vals)
assert.Equal(t, " testLog=3", metrics)
}

func TestBuildMetricsOutputMissing(t *testing.T) {

prevVals := map[string]int64{}
snapshotExpvars(prevVals)

test := expvar.NewInt("testLogEmpty")
test.Add(7)

vals := map[string]int64{}
snapshotExpvars(vals)
metrics := buildMetricsOutput(prevVals, vals)
assert.Equal(t, " testLogEmpty=7", metrics)
}
6 changes: 6 additions & 0 deletions libbeat/tests/system/config/mockbeat.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,10 @@ output:
rotate_every_kb: 1000
#number_of_files: 7

#================================ Logging =====================================

{% if metrics_period -%}
logging.metrics.period: {{ metrics_period }}
{%- endif %}

# vim: set ft=jinja:
10 changes: 10 additions & 0 deletions libbeat/tests/system/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,3 +115,13 @@ def test_console_output_size_flush(self):
self.wait_until(lambda: self.log_contains("Mockbeat is alive"),
max_timeout=2)
proc.check_kill_and_wait()

def test_logging_metrics(self):
self.render_config_template(
metrics_period="0.1s"
)
proc = self.start_beat(logging_args=["-e"])
self.wait_until(
lambda: self.log_contains("No non-zero metrics in the last 100ms"),
max_timeout=2)
proc.check_kill_and_wait()
8 changes: 8 additions & 0 deletions metricbeat/metricbeat.full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -624,6 +624,14 @@ output.elasticsearch:
# Send all logging output to syslog. The default is false.
#logging.to_syslog: true

# If enabled, metricbeat periodically logs its internal metrics that have changed
# in the last period. For each metric that changed, the delta from the value at
# the beginning of the period is logged. The default is true.
#logging.metrics.enabled: true

# The period after which to log the internal metrics. The default is 30s.
#logging.metrics.period: 30s

# Logging to rotating files files. Set logging.to_files to false to disable logging to
# files.
logging.to_files: true
Expand Down
8 changes: 8 additions & 0 deletions packetbeat/packetbeat.full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -863,6 +863,14 @@ output.elasticsearch:
# Send all logging output to syslog. The default is false.
#logging.to_syslog: true

# If enabled, packetbeat periodically logs its internal metrics that have changed
# in the last period. For each metric that changed, the delta from the value at
# the beginning of the period is logged. The default is true.
#logging.metrics.enabled: true

# The period after which to log the internal metrics. The default is 30s.
#logging.metrics.period: 30s

# Logging to rotating files files. Set logging.to_files to false to disable logging to
# files.
logging.to_files: true
Expand Down
8 changes: 8 additions & 0 deletions winlogbeat/winlogbeat.full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -514,6 +514,14 @@ output.elasticsearch:
# Send all logging output to syslog. The default is false.
#logging.to_syslog: true

# If enabled, winlogbeat periodically logs its internal metrics that have changed
# in the last period. For each metric that changed, the delta from the value at
# the beginning of the period is logged. The default is true.
#logging.metrics.enabled: true

# The period after which to log the internal metrics. The default is 30s.
#logging.metrics.period: 30s

# Logging to rotating files files. Set logging.to_files to false to disable logging to
# files.
logging.to_files: true
Expand Down