Skip to content
This repository has been archived by the owner on Sep 26, 2021. It is now read-only.

After instance change ip, the swarm agent must also change the join addr. #806

Open
rossbachp opened this issue Mar 18, 2015 · 4 comments
Open

Comments

@rossbachp
Copy link

After more testing PR #770 I found this:

I detect another changed IP problem, after I restart my swarm ec2 cluster today.

The master use the old ip's from the swarm machines

time="2015-03-18T18:23:54Z" level=error msg="Get https://54.69.29.90:2376/v1.15/info: dial tcp 54.69.29.90:2376: i/o timeout" 
time="2015-03-18T18:23:54Z" level=error msg="Get https://54.69.230.35:2376/v1.15/info: dial tcp 54.69.230.35:2376: i/o timeout" 
time="2015-03-18T18:23:54Z" level=error msg="Get https://54.69.255.39:2376/v1.15/info: dial tcp 54.69.255.39:2376: i/o timeout" 
time="2015-03-18T18:23:54Z" level=error msg="Get https://52.10.167.59:2376/v1.15/info: dial tcp 52.10.167.59:2376: i/o timeout" 

I analyze the problem:

The swarm agent are join with the old ip 52.10.167.59

$ docker-machine ls
NAME               ACTIVE   DRIVER       STATE     URL                        SWARM
amazonec2-03                amazonec2    Stopped                              
dev                         virtualbox   Stopped                              
ec2-swarm-01                amazonec2    Running   tcp://54.149.27.239:2376   ec2-swarm-master
ec2-swarm-02                amazonec2    Running   tcp://52.10.108.31:2376    ec2-swarm-master
ec2-swarm-03       *        amazonec2    Running   tcp://54.148.5.178:2376    ec2-swarm-master
ec2-swarm-master            amazonec2    Running   tcp://52.11.98.189:2376    ec2-swarm-master (master)
$ $(docker-machine env ec2-swarm-master)
$ docker ps --no-trunc
CONTAINER ID                                                       IMAGE               COMMAND                                                                                                                                                                                          CREATED             STATUS              PORTS                              NAMES
13d27667155b3b1962b99b8d817c7a9865b47fe5b0d5d9c0af08735b26163efa   swarm:latest        "/swarm join --addr 52.10.167.59:2376 token://5a57a53a13470b1e680c6904ce5b34d1"                                                                                                                  35 hours ago        Up 11 minutes       2375/tcp                           swarm-agent          
810f7ce04b6439c191470a2116197088ee2a3d2e5ed1cc7f4742aacef46317f9   swarm:latest        "/swarm manage --tlsverify --tlscacert=/etc/docker/ca.pem --tlscert=/etc/docker/server.pem --tlskey=/etc/docker/server-key.pem -H tcp://0.0.0.0:3376 token://5a57a53a13470b1e680c6904ce5b34d1"   35 hours ago        Up 11 minutes       2375/tcp, 0.0.0.0:3376->3376/tcp   swarm-agent-master   
$ docker-machine ip ec2-swarm-master
52.11.98.189

After the IP from swarm machine changed, the implementation must reconfigure the swarm agent, remove the old container and start a new one.

@rossbachp
Copy link
Author

The only quick fix is currently recreate the agent with this tiny script:

create-swam-agent.sh

#!/bin/bash
TOKEN=$(docker inspect -f "{{ index .Config.Cmd 3 }}" swarm-agent)
IP=$(curl http://169.254.169.254/latest/meta-data/public-ipv4)
docker stop swarm-agent
docker rm swarm-agent
docker run -d --name swarm-agent --restart=always swarm \
  join --addr ${IP}:2376 \
  ${TOKEN}

@nathanleclaire
Copy link
Contributor

I think longer-termish we will have to support some kind of "sync" to the config store, I don't know if the Docker Hub token discovery service would support modifying the cluster IPs, but I'm sure the KV backends would.

cc @aluzzardi @vieux @abronan How would you envision workflow for this case (changing IPs in the swarm)?

@abronan
Copy link

abronan commented Jul 14, 2015

@nathanleclaire Entries in the K/V are deleted after TTL expiration (nodes are removed from the discovery). So if the IPs are changing, the store will reflect the state of the cluster correctly after a stop/restart (on EC2 for example). Still you might expect old entries to be listed for a bit of time until their TTL expires (If you have 3 machines, expect to have 6 of those listed even though old entries will be marked as unhealthy and couldn't be used in the Swarm)

As a workaround, if Machine is aware that an instance is restarting, it could directly delete the entry in the K/V to not list machines with wrong IPs after a restart.

@yoshiokatsuneo
Copy link

Here is my workaround after changing IP address of docker swarm node:

% docker-machine env docker-node
% docker-machine regenerate-certs docker-node
(I sometimes need to run multiple times when error occurs.)
% eval $(docker-machine env docker-node)
% export TOKEN=$(docker inspect -f "{{ index .Config.Cmd 3}}" swarm-agent)
% docker rm -f swarm-agent
% docker run -d --name=swarm-agent --restart=always swarm:latest join --advertise "${DOCKER_HOST##tcp://}" "${TOKEN}"

tomeon pushed a commit to tomeon/machine that referenced this issue May 9, 2018
…ocker-daemon

Allow user customisation before and after Docker daemon startup
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants