-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swarm Service - IP resolves differently from within container vs other containers (same overlay) #30963
Comments
Tested with non-encrypted overlay -- same issue. |
UPDATE: It seems that having "--hostname" when creating the service causes the issue, so maybe there is some sort of a bug around that? If you don't have it, the container resolves the same IP from internally and externally. |
|
@aboch It seems that --hostname makes the difference because it matches the name of the service (--name). I think the bug is around which IP gets inserted into /etc/hosts though. So when the --hostname matches the --name, and the first IP is in the /etc/hosts file, that responds from the inside instead of the 2nd IP (the DNS name IP). So essentially you get: |
Yes, the |
I guess the issue is that it causes the collision. (You are right - with the dynamic --name to DNS, it almost seems redundant, other than the nice benefit of setting the container name). Basically in any cluster where the nodes have to agree on the IPs by name, this causes a break now. MariaDB's Galera cluster is actually a really good example. You start the nodes with a gcomm://node1, node2, node3 -- and node2 + node3 will see a different IP for node1 than what node1 will claim it's IP is, with the --hostname flag. |
It looks like we support templates for the newly introduced
^ Hmm no. This one instead:
|
Reproduced too, strange why there is two IPs for a container. |
@aboch thanks. Could there be a warning possibly if the --hostname is set to the same as --name, since it would produce undesirable results? @jmzwcn - @aboch mentioned the reason ^
Having the two IPs is ok, but not having the container/task resolve itself to the same as external nodes is the issue. Basically in any circumstance where multiple nodes need to agree on a "pool" based on name/dns (since it can't be done by --ip (@thaJeztah - this is a great example of the need for a static IP here: #29816, which is the sub-issue for this: #25303 (comment)) |
Another issue that this causes (assuming you remove --hostname or make it different than --name to eliminate that bit), the traffic that leaves is using the 2nd IP. But the advertised/by-name is the first IP. So in a simple example, let's say you have 2 nodes: from B: "ping A" will tell you 10.0.0.2 But the nodes will communicate using 10.0.0.3 and 10.0.0.5 For things that require pre-configuration/exchange of IP or hostname -- this breaks it. You are now starting both nodes with something like "cluster-members: A, B" and they are communicate using neither of those addresses. And since there is no static --ip option, and you can't use by name/DNS, this basically becomes impossible to do with a "docker service" setup. Where as, this works perfectly with "docker run". I think this really needs a re-examine, because there are many clustered applications that tend to work with requiring pre-distribution of IP or hostname in order to cluster. |
Has there been any updates to this issue or workarounds? |
@rahulpalamuttam My current "solution" (more like limited fix) is NOT to use the --hostname, but instead, just use --name, and then use those names everywhere. But yes, the real solution would be for this to be fixed. |
Oh, well that is a bit of a party pooper I guess :( I am deploying cassandra cluster with docker-compose, This issue will likely break any clustering mechanism that relies on node identity to provide seed or contact point, and it makes docker swarm unusable for all but the most simple use cases (scaling NGINX, I guess :). Has anyone found a workaround that could work in conjunction with docker-compose v3? I will continue trying stuff, but having this issue prioritized would be a great enabler for a lot of use cases! Of course, I could always revert back to spelling out precisely my deployment. One entry per services. |
I'm seeing this same behaviour with docker-ce-19.03.1-3.el7.x86_64.rpm in CentOS 7. Edit: I'm using hostname: in Docker compose yaml. Maybe I'll remove it then. |
I use redis-sentinel in docker swarm, and this bug also breaks HA feature. Because when sentinel trying to promote master it breaks whole cluster (cause IP that returns by DNS doesn't match with container IP). Is there some workaround? PS: I don't use hostname in my compose file. Docker version: 19.03.8 |
Got the same problem since docker update to 20.3.x... container ip
dns resolution of service name
Found a simple solution by change endpoint mode to dnsrr. It's a replica 1 service and should be fixed if vip is replaced
Was that changed with docker 20.x ?! |
Using swarm mode, if I have set container's
Withing any container:
10.0.5.2 is a virtual IP, others are containers' IP. This is a bug, because only a single virtual IP is expected here. |
I'm using swarm with a default overlay network and a few days ago some of the services received a second buggy ip address. When I do |
single service does not need a vip and has a bug, see moby/moby#30963
Description
Service containers seem to get 2 IP addresses (let's say x.y.z.2 and x.y.z.3) -- one ends up being the DNS name, and the other is inserted in /etc/hosts. The problem this causes is that the container resolves it's own name to x.y.z.3, while other containers resolve it (by name) to x.y.z.2. For services like MariaDB Galera, where you need to specify an IP or a name, this breaks the cluster, since the cluster advertises by name one IP but in reality the other nodes push a different.
Ex - Starting a simple service (only 1 container) with something like:
Seems to assign 2 IP addresses to the container. Let's say the network is 10.0.0.0/24, it gives it:
10.0.0.2 and 10.0.0.3
Everything resolves "apache" as "10.0.0.2", except that /etc/hosts on the apache container is "10.0.0.3" so if you attach to the apache container and resolve "apache", it thinks it's 10.0.0.3.
Steps to reproduce the issue:
4.) ping $hostname, and note the IP
5.) run another service or simply attach to the overlay (make sure it's attachable when created) and start an alpine container to test with: "docker run -it --rm --net=some-overlay alpine /bin/ash"
6.) ping $hostname again, and note the IP -- it will NOT match Template-less DNS auto-configuration #4
Describe the results you received:
DNS name within the swarm/overlay, and the container's internal hostname<->IP do not match.
Describe the results you expected:
A single IP, or at least for the /etc/hosts to agree with the DNS name that's available.
Additional information you deem important (e.g. issue happens only occasionally):
N/A
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
Swarm cluster with 3 managers, over 3 public IPs (1-1 NAT). Encrypted overlay network.
update: tested with non-encrypted overlay, and same issue.
update#2: it seems that having --hostname when creating the service causes the issue, so maybe there is some sort of a bug around that? If you don't have it, the container resolves the same IP from internally and externally.
The text was updated successfully, but these errors were encountered: