Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debian init script cannot be remotely started with a tty #3053

Closed
blbradley opened this issue Nov 23, 2016 · 26 comments
Closed

Debian init script cannot be remotely started with a tty #3053

blbradley opened this issue Nov 23, 2016 · 26 comments

Comments

@blbradley
Copy link

  • Version: Any beats component and version using Debian init scripts
  • Operating System: Debian-based OSes not using systemd
  • Steps to Reproduce: See this and this Elastic forum discussion.

Some discussion says that Debian doesn't need the go-daemon process as a supervisor. I agree. It will be better to fix this upstream than manage changes to the init script per package.

The second link describes how to fix the init script. Does that seem acceptable?

@tsg
Copy link
Contributor

tsg commented Nov 29, 2016

@blbradley I think the issue from Discuss was fixed with this: fiorix/go-daemon#7 Note that the last reports on the forums except yours are several months old.

If you still see this issue, can you post the steps to reproduce?

@blbradley
Copy link
Author

@tsg I've seen that issue and thought the problem would be fixed by now as well. I'm using filebeat-formula to install filebeat and still observing that issue in filebeat 1.3.1 on Ubuntu Trusty (14.04).

Here is another reason I filed this issue:

I don't think we will be change init scripts due to this issue. But we did prove that we can daemonize a go process without go-daemon by using standard debian tools... so it is an option.

I'll get you a test case very soon. You'll need a few dependencies (Docker, Ruby, and Python). Let me know if that is a problem.

@blbradley
Copy link
Author

blbradley commented Nov 30, 2016

Assuming Docker, Ruby (with bundler), and Python (with pip) is installed:

git clone https://github.com/blbradley/filebeat-formula
cd filebeat-formula
git checkout fix-ubuntu-trusty
pip install -r requirements.txt
bundle install
bundle exec kitchen test ubuntu

This will setup an Ubuntu Trusty Docker container, install filebeat, run/enable the service, and run some tests to confirm all of that happens. The tests fail because filebeat is not running. I'm pretty sure that SaltStack does not start a tty to execute commands.

In addition, if you try to start the service without a tty (and without SaltStack), the service is not started:

➜ bundle exec kitchen exec ubuntu 'service filebeat start'
-----> Execute command on default-ubuntu-1404.
➜ bundle exec kitchen login ubuntu
root@7265ca5f0822:~# service filebeat status
 * filebeat is not running

kitchen exec has a -c parameter which executes a command over SSH. This leads to a similar result as above.

Starting the service using a terminal works as expected.

I can probably prove similar overall behavior for metricbeat since it's init script is generated from the same Jinja template.

@blbradley
Copy link
Author

I also ran this on a Trusty VM for good measure. Same issue.

@tsg
Copy link
Contributor

tsg commented Dec 1, 2016

I was hoping for a more minimal test case :) Just to make sure, the filebeat formula uses the packages from the elastic repos, right?

In addition, if you try to start the service without a tty (and without SaltStack), the service is not started.

This sounds promising for reproducing. Do you know of a way I can do that without kitchen?

@tsg
Copy link
Contributor

tsg commented Dec 1, 2016

I tried, based on this:

setsid sh -c 'tty; /etc/init.d/filebeat start' < /dev/null > log 2>&1

But it looks good, daemon starts and it doesn't have a tty:

# cat log
not a tty
2016/12/01 11:53:24.313335 beat.go:267: INFO Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat]
2016/12/01 11:53:24.313415 beat.go:177: INFO Setup Beat: filebeat; Version: 5.1.1
2016/12/01 11:53:24.313538 output.go:167: INFO Loading template enabled. Reading template file: /etc/filebeat/filebeat.template.json
2016/12/01 11:53:24.313686 output.go:178: INFO Loading template enabled for Elasticsearch 2.x. Reading template file: /etc/filebeat/filebeat.template-es2x.json
2016/12/01 11:53:24.313848 client.go:120: INFO Elasticsearch url: http://localhost:9200
2016/12/01 11:53:24.313901 outputs.go:106: INFO Activated elasticsearch as output plugin.
2016/12/01 11:53:24.313973 publish.go:291: INFO Publisher name: precise32
2016/12/01 11:53:24.314077 async.go:63: INFO Flush Interval set to: 1s
2016/12/01 11:53:24.314113 async.go:64: INFO Max Bulk Size set to: 50
Config OK

Can you try the same in your environment, please?

@tsg
Copy link
Contributor

tsg commented Dec 1, 2016

Also, please double check the logs for errors. Maybe it fails to start due to some missing path permission or similar.

@blbradley
Copy link
Author

I tried your exact command in the Trusty VM (as root with filebeat service not running).

filebeat-god and filebeat are running, but there are no logs in log (other than not a tty) or /var/log/filebeat. Whoa!

Same thing when I replace /etc/init.d/filebeat start with service filebeat start.

Is your environment RedHat based?

@blbradley
Copy link
Author

Just to make sure, the filebeat formula uses the packages from the elastic repos, right?

Yes. Only the 1.x series right now.

@tsg
Copy link
Contributor

tsg commented Dec 1, 2016

The empty log is probably because i used 5.0 for testing, which logs at info level by default. Probably if you add -v you'll have the same result as mine.

@blbradley
Copy link
Author

That does produce some logs in the case you provided. I'm exploring further. Thanks for your speedy responses!

@blbradley
Copy link
Author

I'm able to run salt-call --local service.start filebeat -l debug in a terminal with success. So, I think we can rule out SaltStack as a problem. Anything that tries to start the service remotely is the issue. I think my issue title is now a poor choice of words.

@blbradley
Copy link
Author

Aha! I was able to reproduce over SSH. The problem is actually the opposite of the issue title. Allocating a tty and starting the service via SSH will stop the service after logout.

Logs are due to -v argument.

ssh -t -i .kitchen/docker_id_rsa [email protected] -p 32771 "tty; service filebeat start"
/dev/pts/0
2016/12/01 20:40:59.708292 geolite.go:24: INFO GeoIP disabled: No paths were set under output.geoip.paths
2016/12/01 20:40:59.712773 outputs.go:126: INFO Activated elasticsearch as output plugin.
2016/12/01 20:40:59.713305 publish.go:288: INFO Publisher name: 487b04041575
2016/12/01 20:40:59.714243 async.go:78: INFO Flush Interval set to: 1s
2016/12/01 20:40:59.714694 async.go:84: INFO Max Bulk Size set to: 50
2016/12/01 20:40:59.714860 beat.go:168: INFO Init Beat: filebeat; Version: 1.3.1
Connection to 192.168.99.100 closed.
✗ bundle exec kitchen login ubuntu-12
Last login: Thu Dec  1 20:40:59 2016 from 192.168.99.1
root@487b04041575:~# ps aux | grep filebeat
root      2367  0.0  0.1   6512  1340 pts/0    S+   20:41   0:00 grep --color=auto filebeat

@blbradley blbradley changed the title Debian init script cannot be started without a tty Debian init script cannot be remotely started with a tty Dec 1, 2016
@blbradley
Copy link
Author

ssh -t allocates a tty, FYI.

@blbradley
Copy link
Author

Test Kitchen uses SSH by default to run commands and allocates a TTY to do so.

✗ bundle exec kitchen exec ubuntu-12 -c 'tty'
-----> Execute command on default-ubuntu-1204.
       /dev/pts/0

@blbradley
Copy link
Author

Do I need to provide an easier way to reproduce? Or should one of us try to fix the init script?

@tsg
Copy link
Contributor

tsg commented Dec 1, 2016

I can reproduce it now as well, many thanks for digging into this.

So this works:

ssh tester-ubuntu1204-32 "sudo /etc/init.d/filebeat start"

But this doesn't:

ssh -t tester-ubuntu1204-32 "sudo /etc/init.d/filebeat start"

Same can be reproduced with go-daemon only. This doesn't work as expected:

ssh -t tester-ubuntu1204-32 "/usr/share/filebeat/bin/filebeat-god /bin/sleep 100"

It's probably because go-daemon doesn't call setsid, see: http://stackoverflow.com/a/8777697/15685

@blbradley
Copy link
Author

Great! I'm going to dig into the second Elastic Discuss link to see if I can daemonize on Debian without go-daemon.

@tsg
Copy link
Contributor

tsg commented Dec 1, 2016

Yeah, I'm sure it's possible with start-stop-daemon. But since we have to support anyway RH based systems that don't have an equivalent (at least, none that's always installed), I'm a bit hesitant to do that.

@blbradley
Copy link
Author

I understand that requirement. Aren't the init scripts for Debian and RedHat managed separately in this repo? If so, would that alleviate your concern?

@tsg
Copy link
Contributor

tsg commented Dec 1, 2016

They are handled separately, yes. I have two concerns:

  • Debugging init scripts is always a pain, so keeping them similar across platforms is an advantage IMHO.
  • We are considering adding some other features to go-daemon, like using cgroup to limit the CPU usage. Limiting cgroups is very simple with systemd, but it's probably not easily possible from an init script, so I was thinking go-daemon could be the place for it.

What is your concern about keeping using go-daemon? More issues like this one?

@blbradley
Copy link
Author

My issue is that the shipped init scripts can't be used with systems that allocate a tty for remote command execution. While my case is limited to Test Kitchen, it seems like SaltStack and Puppet have issues (on Debian) with these init scripts as well. So, I don't have a problem with go-daemon per se.

I agree that debugging init scripts (and Bash in general) can be difficult. Keeping them very similar between platforms may be advantageous (and ideal) but not work in practice.

One way of implementing cgroups usage would be to require users to use systemd. If this overall issue is due to another bug in go-daemon, then it that would be another way to handle this.

@tsg
Copy link
Contributor

tsg commented Dec 2, 2016

I was trying to figure this out in go-daemon, but so far I didn't find a solution to make it work with ssh -t. I think this also doesn't work with Debian's start-stop-daemon. This is what I tried:

ssh -t  tester-ubuntu1204-32 "start-stop-daemon --start --background --exec /bin/sleep  -- 100"

This doesn't work. If you remove the -t, it does.

So I'm no longer sure that the -t is the correct way to reproduce it. Can give it a try to see if you can fix the init script using start-stop-daemon, please? Or if you have any ideas on what's going on..

@blbradley
Copy link
Author

👀

Sure! I'll do that soon and probably go ahead with managing the start-stop-debian init script in filebeat-formula until this gets fixed upstream.

@blbradley
Copy link
Author

I was able to get the init script working using the instructions in the second linked Elastic Discuss post. I'll need to do more testing to see if the PID file works appropriately.

@urso
Copy link

urso commented Mar 3, 2019

Closing. Issue stalled for 2+ years, plus no new reports. Assuming it is fixed with go-daemon update.

@urso urso closed this as completed Mar 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants