Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoregister Nomad servers and clients as a service into Consul #510

Closed
adrianlop opened this issue Nov 27, 2015 · 12 comments
Closed

Autoregister Nomad servers and clients as a service into Consul #510

adrianlop opened this issue Nov 27, 2015 · 12 comments

Comments

@adrianlop
Copy link
Contributor

Hi,

Is this feature in the Nomad-Consul roadmap? I think it would be nice to have this.

Thank you!

@diptanu
Copy link
Contributor

diptanu commented Nov 27, 2015

@poll0rz Can you give us some use cases of registering the Nomad Clients with Consul please? The clients currently doesn't have any API and they are constantly sending heart beats to the Nomad servers, so using Consul checks or making them discoverable over the network for other applications probably doesn't add a lot of value.

@adrianlop
Copy link
Contributor Author

Sure, here is an example:
Problem --> a machine where a Nomad client is up & running, but Nomad client failed or was killed by the OS OOM killer, so Nomad server marks this client as down and you cannot allocate any job in this machine anymore (so if you are in AWS, you're paying for a machine with no use).
Solution --> If you have the Nomad clients registered in Consul, you can easily monitor your Nomad cluster health with Consul (via watch or just looking at the UI) and troubleshoot/fix the problem with your Nomad client.

What do you think? Or maybe this could be done just with Nomad already?

@kaskavalci
Copy link
Contributor

I'm wondering as well what is the proposed action to re-run the clients when they fail.

@cbednarski
Copy link
Contributor

...but Nomad client failed or was killed by the OS OOM killer...
I'm wondering as well what is the proposed action to re-run the clients when they fail.

Your init system should restart Nomad in these cases. I think it makes sense to be able to use Consul to keep an eye on things, though.

@adrianlop
Copy link
Contributor Author

hi @cbednarski what do you mean with "your init system should restart Nomad in these cases"? you mean Nomad clients should be started with http://supervisord.org/ or similar?

Yes I think it's a good idea that Nomad clients & servers register themselves in Consul, since we're going Hashicorp's full stack and we should know what's happening not only with our services, but with our distributed service scheduler as well, right?

I hope you agree with that and make it possible heheh. I'm willing to contribute to the project too, but I'm just a Go rookie for now.
Please let me know if I could help in any way.

@cbednarski
Copy link
Contributor

you mean Nomad clients should be started with http://supervisord.org/ or similar?

Right. The OS's init system like upstart, systemd, or net start should handle this.

The consul registration / health checking issue is an open question, though. We'll update here when we have more info.

@supernomad
Copy link

@adrianlop I have a potential workaround for you.

I am not sure what systems you are using, but if you are using an OS that utilizes systemd you can use an ExecStartPre= or ExecStartPost= call to register the nomad service with consul. Then on stop you can have it automatically removed with a ExecStop= call. This is what I am doing and it works like a charm.

A basic config (if your using systemd):

[Unit]
Description=Nomad Service
Wants=network-online.target consul.service
After=network.target network-online.target consul.service

[Service]
Type=simple
ExecStartPre=/usr/bin/curl -XPUT http://127.0.0.1:8500/v1/agent/service/register -H "Content-Type: application/json" -d '{"ID":"nomad:client","Name":"nomad","Tags":["client"],"Port":4646}'
ExecStart=/etc/nomad/nomad agent -config /etc/nomad -retry-join 1.1.1.1
ExecStop=/usr/bin/curl -XPUT http://127.0.0.1:8500/v1/agent/service/deregister/nomad%3Aclient
ExecStop=/etc/nomad/nomad stop

If you are not using systemd I am sure you can adapt the above to whatever init systems you are using.

@adrianlop
Copy link
Contributor Author

@supernomad thank you! I also had a workaround meanwhile. It consists on configuring Consul with a "nomad.json" that contains the service registry for Nomad, but your solution is elegant too ;)

@supernomad
Copy link

@adrianlop Thanks for the props. I was very tempted to do the config file as well when trying to figure this out, but I was worried that I would end up with incorrect service definitions or at the very least out of sync ones.

How did you get around that, or is it just not an issue?

@adrianlop
Copy link
Contributor Author

@supernomad using config files for services and checks is safe. works great in static infrastructure if you have everything deployed with a CM (ansible, puppet..).

btw, I think it'd be good if you add Restart=always into your [Service] definition.
In case Nomad agent crashes, systemd will try to start it again (if that's what you want heheh).

@diptanu
Copy link
Contributor

diptanu commented May 17, 2016

This has landed in master via #1167

@diptanu diptanu closed this as completed May 17, 2016
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants