Telegraf does not exit on Linux #1603

butitsnotme · 2016-08-08T15:16:06Z

Bug report

When Telegraf is being run as a Linux daemon (through either Systemd or SysvInit) when told to stop it does not. It must be manually killed using kill -9 . I've included my telegraf.conf below, but it also occurs on the default telegraf.conf. (The longest I've left it is overnight, so it doesn't seem to be waiting for the next collect cycle, it was still running ~16 after being told to stop).

When using Systemd (on Ubuntu) during normal operation it lists two proceses in the cgroup, /bin/sh and /usr/bin/telegraf, the first being the parent of the second. When systemctl stop telegraf is executed the first (/bin/sh) exits, but the second does not, it will still be running hours later.

Relevant telegraf.conf:

# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "10m"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at
  ## most metric_batch_size metrics.
  metric_batch_size = 1000
  ## For failed writes, telegraf will cache metric_buffer_limit metrics for each
  ## output, and will flush this buffer on a successful write. Oldest metrics
  ## are dropped first when this buffer fills.
  metric_buffer_limit = 10000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. You shouldn't set this below
  ## interval. Maximum flush_interval will be flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  ## Run telegraf in debug mode
  debug = false
  ## Run telegraf in quiet mode
  quiet = false
  ## Override default hostname, if empty use os.Hostname()
  hostname = "<set but redacted>"
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false

System info:

Telegraf - version 0.13.2
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:        16.04
Codename:       xenial

Steps to reproduce:

Install telegraf on Linux (tested on Ubuntu 16.04, CentOS 6, CentOS 7)
Start the service (using the normal OS mechanism)
run ps aux | grep telegraf
Observe that there is an sh process and a telegraf process (the sh process is the parent of the telegraf process)
Stop the service (using the same OS tool)
run ps aux | grep telegraf
Observe that the telegraf process is still running
Wait a while (30mins-1h should be enough for demonstration)
Run ps aux | grep telegraf again
Observe that it is still running.

Expected behavior:

Telegraf should exit within a few seconds of being stopped by the OS (at absolute most, the next time it runs the collect).

Actual behavior:

It continues running indefinitely.

Additional info:

There is nothing abnormal in the logs, they just show telegraf running...

The text was updated successfully, but these errors were encountered:

sparrc · 2016-08-08T15:26:13Z

this is fixed in 1.0, see #1252 & #1279

sparrc closed this as completed Aug 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Telegraf does not exit on Linux #1603

Telegraf does not exit on Linux #1603

butitsnotme commented Aug 8, 2016

sparrc commented Aug 8, 2016

Telegraf does not exit on Linux #1603

Telegraf does not exit on Linux #1603

Comments

butitsnotme commented Aug 8, 2016

Bug report

Relevant telegraf.conf:

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

Additional info:

sparrc commented Aug 8, 2016