Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systemd unit doesn't kill child process (restart/stop/reload) #1576

Closed
sbadia opened this issue Aug 3, 2016 · 6 comments
Closed

Systemd unit doesn't kill child process (restart/stop/reload) #1576

sbadia opened this issue Aug 3, 2016 · 6 comments

Comments

@sbadia
Copy link

sbadia commented Aug 3, 2016

Bug report

Using systemd, the stop or reload/restart command doesn't kill the child telegraf processes.

System info:

  • OS: Ubuntu 16.04 LTS
  • Telegraf: 0.13.2-1

Steps to reproduce:

msqall-0001-stg2:~# ps aux|grep '[t]elegraf'
telegraf  6479  0.1  1.7 393060 30092 ?        Sl   Jun15  94:38 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
telegraf  8546  0.0  1.5 382496 27764 ?        Sl   Jun15  26:44 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
telegraf 22447  0.0  0.0   4508   788 ?        Ss   13:46   0:00 /bin/sh -c /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d  >>/var/log/telegraf/telegraf.log 2>>/var/log/telegraf/telegraf.log
telegraf 22448  0.0  1.5 234516 27120 ?        Sl   13:46   0:00 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
msqall-0001-stg2:~# systemctl stop telegraf.service
msqall-0001-stg2:~# echo $?
0
msqall-0001-stg2:~# ps aux|grep '[t]elegraf'
telegraf  6479  0.1  1.7 393060 30140 ?        Sl   Jun15  94:38 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
telegraf  8546  0.0  1.5 382496 27764 ?        Sl   Jun15  26:44 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
telegraf 22448  0.0  1.5 234516 27120 ?        Sl   13:46   0:00 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d

Expected behavior:

The stop or restart/reload action should not leave remaining process.

Actual behavior:

Un-controlled process are still running

Use case:

When we upgrade packages the old process are still running… it's a bit annoying…

sbadia added a commit to sbadia/telegraf that referenced this issue Aug 3, 2016
…oad)

killall binary is included in the package psmisc.
we maybe have another way to solve this issue with « standard » tools
https://stackoverflow.com/questions/392022/best-way-to-kill-all-child-processes

Refs: influxdata#1576
sbadia added a commit to sbadia/telegraf that referenced this issue Aug 3, 2016
…oad)

killall binary is included in the package psmisc.
we maybe have another way to solve this issue with « standard » tools
https://stackoverflow.com/questions/392022/best-way-to-kill-all-child-processes

Refs: influxdata#1576
@moltenkaizen
Copy link

Thanks for reporting this. I'm also seeing the same issue. This is why my changes to telegraf.conf weren't being reflected. After killing processes, telegraf is re-reading my edited config and working properly.

@sparrc
Copy link
Contributor

sparrc commented Aug 3, 2016

@sbadia wasn't this already fixed by #1279 ?

@sparrc
Copy link
Contributor

sparrc commented Aug 3, 2016

(fix is in 1.0.0-beta1)

@sbadia
Copy link
Author

sbadia commented Aug 4, 2016

Oh! Sorry I didn't notice this one. Just tested on 0.13.2-1+ the exec patch, and indeed it's OK.
Thanks @sparrc !
Hum, we can maybe backport/cherry-pick #1279 in 1.13.x series no?

@sbadia sbadia closed this as completed Aug 4, 2016
@sparrc
Copy link
Contributor

sparrc commented Aug 4, 2016

Unfortunately I will not have time to cherry pick that into a 0.13.x release. For users needing a fix see the change to the systemd telegraf.service file here: https://github.com/influxdata/telegraf/pull/1279/files

@sbadia if you could post a step-by-step how you patched it, that might help other users as well. Thank you!

@sbadia
Copy link
Author

sbadia commented Aug 4, 2016

@sparrc ack !

Yes, sure, here are the manual procedure, (it's a bit shorter with a config. mgt tool)

Intro (tested on telegraf 0.13.x)

server:~# dpkg -l|grep telegraf
hi  telegraf                           0.13.2-1                                 amd64        Plugin-driven server agent for reporting metrics into InfluxDB.
server:~# ps aux|grep '[t]elegraf'
telegraf  1415  0.0  0.1 613968 28512 ?        Sl   Jul13  18:57 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
telegraf  7801  0.0  0.0   4508   788 ?        Ss   Aug03   0:00 /bin/sh -c /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d  >>/var/log/telegraf/telegraf.log 2>>/var/log/telegraf/telegraf.log
telegraf  7802  0.0  0.1 532324 29364 ?        Sl   Aug03   0:14 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d

Stop the telegraf service, and show reaming process

server:~# systemctl stop telegraf.service
server:~# ps aux|grep '[t]elegraf'
telegraf  1415  0.0  0.1 613968 28512 ?        Sl   Jul13  18:57 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
telegraf  7802  0.0  0.1 532324 29364 ?        Sl   Aug03   0:14 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d

Kill them all :-)

server:~# killall telegraf

Backup and patch the systemd unit using master raw

server:~# cp /lib/systemd/system/telegraf.service{,.bak}
server:~# curl -s https://raw.githubusercontent.com/influxdata/telegraf/master/scripts/telegraf.service > /lib/systemd/system/telegraf.service
server:~# diff -u /lib/systemd/system/telegraf.service.bak /lib/systemd/system/telegraf.service
--- /lib/systemd/system/telegraf.service.bak    2016-08-04 09:08:16.467528595 +0000
+++ /lib/systemd/system/telegraf.service    2016-08-04 09:09:23.395475102 +0000
@@ -8,11 +8,10 @@
 User=telegraf
 Environment='STDOUT=/var/log/telegraf/telegraf.log'
 Environment='STDERR=/var/log/telegraf/telegraf.log'
-ExecStart=/bin/sh -c "/usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d ${TELEGRAF_OPTS} >>${STDOUT} 2>>${STDERR}"
+ExecStart=/bin/sh -c "exec /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d ${TELEGRAF_OPTS} >>${STDOUT} 2>>${STDERR}"
 ExecReload=/bin/kill -HUP $MAINPID
 Restart=on-failure
-KillMode=process
+KillMode=control-group

 [Install]
 WantedBy=multi-user.target
-Alias=telegraf.service

And finaly reload systemd and relaunch telegraf

server:~# systemctl daemon-reload
server:~# systemctl start telegraf.service
server:~# rm /lib/systemd/system/telegraf.service.bak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants