Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf panic with prometheus input plugin #5143

Closed
vazhem opened this issue Dec 13, 2018 · 2 comments
Closed

Telegraf panic with prometheus input plugin #5143

vazhem opened this issue Dec 13, 2018 · 2 comments
Labels
bug unexpected problem or unintended behavior panic issue that results in panics from Telegraf regression something that used to work, but is now broken
Milestone

Comments

@vazhem
Copy link

vazhem commented Dec 13, 2018

Relevant telegraf.conf:

Several inputs (prometheus, influxdb_listener, etc..) and outputs (multiple influxdb, prometheus) of different types. prometheus inputs like below

[[inputs.prometheus]]
  interval = "300s"
  response_timeout = "30s"
  urls = [
    "https://127.0.0.1:1111/metrics",
    "https://127.0.0.1:2222/metrics"
  ]
  namepass = [
    "aaa",
  ]
  insecure_skip_verify = true
# Influx HTTP write listener
[[inputs.influxdb_listener]]
  service_address = ":1111"
  read_timeout = "30s"
  write_timeout = "30s"

System info:

Telegraf 1.9.1
SLES 12 SP-3

Steps to reproduce:

Send signal to reload config:

pkill -SIGHUP telegraf

Telegraf crashes.

Expected behavior:

No panic.

Actual behavior:

2018-12-12T17:05:24Z D! [agent] Stopping service inputs
2018-12-12T17:05:24Z I! Stopped HTTP listener service on  :1111
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1305909]

goroutine 97 [running]:
github.com/influxdata/telegraf/plugins/inputs/prometheus.(*Prometheus).Stop(0xc0004a0360)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/prometheus/prometheus.go:308 +0x29
github.com/influxdata/telegraf/agent.(*Agent).stopServiceInputs(0xc0000c0268)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:636 +0xa1
github.com/influxdata/telegraf/agent.(*Agent).Run.func1(0xc00030dbd0, 0xc0000c0268, 0x250a860, 0xc000217480, 0xbefc6d9c3b5974f5, 0x251069f4, 0x3cd7260, 0xc0000c38c0)
        /go/src/github.com/influxdata/telegraf/agent/agent.go:75 +0x138
created by github.com/influxdata/telegraf/agent.(*Agent).Run
        /go/src/github.com/influxdata/telegraf/agent/agent.go:66 +0x3bb

Additional info:

[Include gist of relevant config, logs, etc.]

@danielnelson danielnelson added this to the 1.9.2 milestone Dec 13, 2018
@danielnelson danielnelson added bug unexpected problem or unintended behavior panic issue that results in panics from Telegraf regression something that used to work, but is now broken labels Dec 13, 2018
@glinton
Copy link
Contributor

glinton commented Dec 13, 2018

Does https://github.com/influxdata/telegraf/blob/1.9.1/plugins/inputs/prometheus/prometheus.go#L308 just need to be p.wg.Wait() instead of p.cancel()? Removing the cancel prevents the panic and allows telegraf to restart.

@danielnelson
Copy link
Contributor

Looks to me like we need to only call cancel if it is set. Right now it looks like it panics on shutdown unless you are using the k8s monitoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior panic issue that results in panics from Telegraf regression something that used to work, but is now broken
Projects
None yet
Development

No branches or pull requests

3 participants