Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflicting Node ID for nomad client #1240

Closed
ghost opened this issue Jun 8, 2016 · 8 comments
Closed

Conflicting Node ID for nomad client #1240

ghost opened this issue Jun 8, 2016 · 8 comments

Comments

@ghost
Copy link

ghost commented Jun 8, 2016

Nomad version

Nomad v0.3.2

Operating system and Environment details

Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29) x86_64 GNU/Linux

Issue

I'm currently experiencing and issue whereby all my clients (backend instances running Nomad in client mode) are advertising the same Node ID.

Reproduction steps

Use www.packer.io (or similar) to build a pre-configured image with Nomad and Docker installed. Then use Google or AWS instances groups to start up multiple instances from the image. There is no need to boot them up at the same time to try and provoke a race condition, they can be started sequentially. Note that the instances groups will create unique hostnames for each instance.

Type node-status on another machine and watch as the hostname for the given Node ID changes as you add further instances:

bdd7800c  europe-west1  backend-standard-2-74du   <none>  true   ready

after increasing instance group to 2 instances:

bdd7800c  europe-west1  backend-standard-2-zzb5   <none>  true   initializing

after increasing instance group to 3 instances:

bdd7800c  europe-west1  backend-standard-2-ui8t   <none>  true   initializing
@ghost
Copy link
Author

ghost commented Jun 8, 2016

I was just looking at the code (

path := filepath.Join(c.config.StateDir, "client-id")
) and I'm wondering what are the advantages of restoring the old node ID?

Naturally, prior to starting Nomad the client isn't running, and therefore there aren't any allocations running on the instance - and whilst I'm not personally familiar with the codebase, in my experience Nomad is more than happy with nodes with different node IDs joining and leaving the cluster, re-evaulating jobs as necessary. Is there any reason why a Nomad client couldn't always regenerate its UUID on start?

@dadgar
Copy link
Contributor

dadgar commented Jun 8, 2016

@grobinson-blockchain Hey when building the packer image just do not start the client before backing the AMI or delete the data_dir. The Nomad Client reuses the NodeID so that if you do an in-place upgrade of the Client it re-registers with the same ID so existing allocations continue running properly without having to be migrated.

If this doesn't make sense let me know. Closing!

@dadgar dadgar closed this as completed Jun 8, 2016
@ghost
Copy link
Author

ghost commented Jun 8, 2016

@dadgar The former may be difficult because we use the same salt state to provision our images as we do our live instances. However, we can add an ad-hoc shell provisioner to stop nomad and remove the data dir as the final step. Thanks!

@ryaker
Copy link

ryaker commented Jan 26, 2019

well its 2 and half years later and the same thing sjust happened to me. Is there any work around short of starting with a new VM image that doesn't have nomad on it yet or never came up as a client? Any way to change the nomad_id on existing client nodes

@teeler
Copy link

teeler commented Feb 14, 2019

@ryaker you're not alone! Same thing just happened to me.

I was also starting the service in packer, going to try not doing that and hopefully any state that it needs for client-id purposes won't be present...

@teeler
Copy link

teeler commented Feb 14, 2019

Yea, that worked. @ryaker if you blow away the data dir, when it starts you'll get a new client id.

@nmegahed
Copy link

When you say Delete the data_dir do you mean the contents of it or the whole thing ?

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants