Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Toward hostname configuration consistent with bosh-dns #30

Closed
mlmitch opened this issue May 22, 2018 · 16 comments
Closed

Toward hostname configuration consistent with bosh-dns #30

mlmitch opened this issue May 22, 2018 · 16 comments

Comments

@mlmitch
Copy link

mlmitch commented May 22, 2018

Bosh-deployed machines currently have /etc/hostname is set to the Bosh agent id.

This results in hostname syscalls (hostname, uname, cat /proc/sys/kernel/hostname) returning the Bosh agent id.

The Bosh agent id is also appended to the localhost line in /etc/hosts according to the template

127.0.0.1 localhost {{ . }}
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback {{ . }}
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

This results in hostname syscalls for the fully qualified domain name (FQDN) returning "localhost". This format is slightly off since Debian documentation states that the hostname should appear on a seperate line in /etc/hosts with the machine's external ip address. This means creating a seperate line in /etc/hosts with the ip address return of server.address and the contents of /etc/hostname.

When bosh-dns enters the picture, each machine has FQDNs that are resolvable by other hosts.
These FQDNs appear in the /etc/hosts file and server.address resolves to the FQDN or the machine's external ip address depending on whether use_dns_addresses is set to true or false. I think this behavior is inconsistent with Bosh hostname behavior.

Here is what I think the behavior should be when bosh-dns is enabled.

  • hostname syscalls should return the machine instance id
  • hostname syscalls for the FDQN should return <instance id>.<domain name>
  • server.address resolves to the FQDN <instance id>.<domain name> or the machine's permanent ip address depending on whether use_dns_addresses is set to true or false (This one is the current bosh-dns behavior)

This way, hostname syscalls for the FQDN and server.address match.

If backward compatibility could be completely disregarded, I would make the following changes to achieve this.
bosh-agent:

  • Set /etc/hostname to the instance id instead of the bosh agent id.
  • Remove the bosh agent id from /etc/hosts
  • Create /etc/hosts entry with the machine's external ip address and the instance id. This would read <ip address> <instance id>

bosh-dns:

  • Populate /etc/hosts so the above line reads <ip address> <instance id>.<domain> <instance id>

I realize there might be some reliance on the bosh agent id though. Therefore the folowing changes might be more palatable.
bosh-agent:

  • Remove dependencies on the hostname being the bosh agent id
  • Keep /etc/hostname set to the bosh agent id
  • Create /etc/hosts entry with 127.0.0.1 or 127.0.1.1 and the bosh agent id
  • Remove the bosh agent id from the "localhost" line in /etc/hosts
  • Create /etc/hosts entry with the machine's external ip address and the instance id

bosh-dns:

  • Set /etc/hostname to the instance id instead of the bosh agent id.
  • Populate /etc/hosts so the above line reads <ip address> <instance id>.<domain> <instance id>

The latter course of action makes sure the bosh agent id is still resolvable. It also stops the current behavior of FQDN requests returning "localhost". Then, if bosh-dns is being used, hostname behavior is consistent with the rest of bosh-dns.

This is from the Linux point of view (specifically debian), so I'm unsure what actions are needed to stay consistent across operating systems. I am also aware that this has been brought up here before, but I thought I would formalise it a bit more.

@svrc
Copy link

svrc commented May 23, 2018

+1 I haven’t thought through all the trade offs on the proposed changes but I’ve been meaning to say for a while that mapping hostname to localhost is a no-no that has caused me misery in the past

@cppforlife
Copy link
Contributor

one of the challenges we have with setting hostname is that it's not clear what we should set it to since there could be multiple networks configured for each instance. concept of fqdn doesnt translate well in this scenario. additionally we have more advanced dns functionality for doing certain level of filtering, etc. so it would be problematic to hardcode just one configuration for fqdn.

@mlmitch what do you think about job specifically setting hostname and bosh-agent relinquishing ownership of it once it sees that it's different from the agent-id (ie making bosh-agent only set it on the first time)? job could run hostname <%= ... %> though im still trying to figure out how to best set hostname -f...

@mlmitch
Copy link
Author

mlmitch commented May 23, 2018

@cppforlife you've mentioned the issue of multiple networks before. It makes sense to me that you would set it in the same manner as the <%=server.address%> return when use_dns_addresses is set to true. Although, I don't know how that is set when there are multiple networks.

I like the idea of a hostname job or add-on for now. That might enable any changes to be opt-in. In my specific use case, I want the FQDN to be <%=server.address%> since that is what other hosts can resolve. I'm not sure how possible that is with a job/add-on.

Assuming the return of <%=server.address%> is going to be the FQDN, the instance id should be the hostname and it should be appended to the end of the /etc/hosts line that contains <%=server.address%>. That would correctly set hostname -f or any other equivalent syscalls.

Regardless of the above, I consider returning "localhost" for the FDQN to be a bug. Though, that one is easily fixed by putting the agent id on a separate line in /etc/hosts, even if it still resolves to the same ip address.

@mlmitch
Copy link
Author

mlmitch commented May 28, 2018

Bosh documentation on multi-homed VMs indicates a possible approach. A hostname value could be added to the network's default property. Then the hostname would be chosen in a similar way to gateway or dns which are network specific things.

@mlmitch
Copy link
Author

mlmitch commented Jun 19, 2018

I've created a pull request to solve FQDN requests returning "localhost".

@jdesulme
Copy link

We're running into a similar issue with any machine deployed with Bosh returns localhost as the hostname of the machine. An unfortunate side affect of this especially for monitoring within a cluster of EC2 instances is that there's no way to differentiate between the various nodes. This has hampered our ability accurately assess a failing node. Working with NewRelic support team they have mentioned that we could override the /etc/hosts file to fix it but unfortunately since the file is managed by Bosh any manual changes do get overridden when a deployment occurs.

e.g. https://docs.newrelic.com/docs/agents/java-agent/troubleshooting/host-links-missing-java-apps-apm-overview

@dpb587-pivotal
Copy link
Contributor

We'll track this via Story #159974467 and try to discuss it a bit more soon.

@amhuber
Copy link

amhuber commented Nov 1, 2018

This is breaking New Relic monitoring on Java for us as well. Can this be prioritized for a fix soon?

@Lafunamor
Copy link

It would be very helpful to make the FQDN configurable with BOSH DNS as some software relies on this being a constant value. E.g. mongodb uses the FQDN to determine it's identity in a replicaset.

@amhuber
Copy link

amhuber commented Dec 6, 2018

Any ETA for this yet? We have production workloads we can't fully monitor in New Relic and it seems like this is a fairly trivial fix in the BOSH automation (just need to add a name into /etc/hosts).

@jdesulme
Copy link

Hey @dpb587-pivotal - any update for this yet your original ticket said that you would discuss it more?

@dpb587-pivotal
Copy link
Contributor

cc @mgadiya @luan as this is in the realm of your work and I'm not sure where it falls in priorities. The issue description gives a pretty good summary of needs, the earlier-referenced story refers to another Slack discussion around this, and the suggestion of using default is probably something to keep in mind if you're considering this. Let me know if I can help provide context or additional ideas.

@Lafunamor
Copy link

has there been any progress on this?

@bosh-admin-bot
Copy link

This issue was marked as Stale because it has been open for 21 days without any activity. If no activity takes place in the coming 7 days it will automatically be close. To prevent this from happening remove the Stale label or comment below.

@bosh-admin-bot
Copy link

This issue was closed because it has been labeled Stale for 7 days without subsequent activity. Feel free to re-open this issue at any time by commenting below.

@Lafunamor
Copy link

I'd actually prefer if one can overwrite the hostname and fqdn. E.g. the network is shared but the vms might belong to different domains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants