After instance change ip, the swarm agent must also change the join addr. #806

rossbachp · 2015-03-18T19:45:48Z

After more testing PR #770 I found this:

I detect another changed IP problem, after I restart my swarm ec2 cluster today.

The master use the old ip's from the swarm machines

time="2015-03-18T18:23:54Z" level=error msg="Get https://54.69.29.90:2376/v1.15/info: dial tcp 54.69.29.90:2376: i/o timeout" 
time="2015-03-18T18:23:54Z" level=error msg="Get https://54.69.230.35:2376/v1.15/info: dial tcp 54.69.230.35:2376: i/o timeout" 
time="2015-03-18T18:23:54Z" level=error msg="Get https://54.69.255.39:2376/v1.15/info: dial tcp 54.69.255.39:2376: i/o timeout" 
time="2015-03-18T18:23:54Z" level=error msg="Get https://52.10.167.59:2376/v1.15/info: dial tcp 52.10.167.59:2376: i/o timeout"

I analyze the problem:

The swarm agent are join with the old ip 52.10.167.59

$ docker-machine ls
NAME               ACTIVE   DRIVER       STATE     URL                        SWARM
amazonec2-03                amazonec2    Stopped                              
dev                         virtualbox   Stopped                              
ec2-swarm-01                amazonec2    Running   tcp://54.149.27.239:2376   ec2-swarm-master
ec2-swarm-02                amazonec2    Running   tcp://52.10.108.31:2376    ec2-swarm-master
ec2-swarm-03       *        amazonec2    Running   tcp://54.148.5.178:2376    ec2-swarm-master
ec2-swarm-master            amazonec2    Running   tcp://52.11.98.189:2376    ec2-swarm-master (master)
$ $(docker-machine env ec2-swarm-master)
$ docker ps --no-trunc
CONTAINER ID                                                       IMAGE               COMMAND                                                                                                                                                                                          CREATED             STATUS              PORTS                              NAMES
13d27667155b3b1962b99b8d817c7a9865b47fe5b0d5d9c0af08735b26163efa   swarm:latest        "/swarm join --addr 52.10.167.59:2376 token://5a57a53a13470b1e680c6904ce5b34d1"                                                                                                                  35 hours ago        Up 11 minutes       2375/tcp                           swarm-agent          
810f7ce04b6439c191470a2116197088ee2a3d2e5ed1cc7f4742aacef46317f9   swarm:latest        "/swarm manage --tlsverify --tlscacert=/etc/docker/ca.pem --tlscert=/etc/docker/server.pem --tlskey=/etc/docker/server-key.pem -H tcp://0.0.0.0:3376 token://5a57a53a13470b1e680c6904ce5b34d1"   35 hours ago        Up 11 minutes       2375/tcp, 0.0.0.0:3376->3376/tcp   swarm-agent-master   
$ docker-machine ip ec2-swarm-master
52.11.98.189

After the IP from swarm machine changed, the implementation must reconfigure the swarm agent, remove the old container and start a new one.

The text was updated successfully, but these errors were encountered:

rossbachp · 2015-03-18T19:49:01Z

The only quick fix is currently recreate the agent with this tiny script:

create-swam-agent.sh

#!/bin/bash
TOKEN=$(docker inspect -f "{{ index .Config.Cmd 3 }}" swarm-agent)
IP=$(curl http://169.254.169.254/latest/meta-data/public-ipv4)
docker stop swarm-agent
docker rm swarm-agent
docker run -d --name swarm-agent --restart=always swarm \
  join --addr ${IP}:2376 \
  ${TOKEN}

nathanleclaire · 2015-07-06T19:09:49Z

I think longer-termish we will have to support some kind of "sync" to the config store, I don't know if the Docker Hub token discovery service would support modifying the cluster IPs, but I'm sure the KV backends would.

cc @aluzzardi @vieux @abronan How would you envision workflow for this case (changing IPs in the swarm)?

abronan · 2015-07-14T21:39:17Z

@nathanleclaire Entries in the K/V are deleted after TTL expiration (nodes are removed from the discovery). So if the IPs are changing, the store will reflect the state of the cluster correctly after a stop/restart (on EC2 for example). Still you might expect old entries to be listed for a bit of time until their TTL expires (If you have 3 machines, expect to have 6 of those listed even though old entries will be marked as unhealthy and couldn't be used in the Swarm)

As a workaround, if Machine is aware that an instance is restarting, it could directly delete the entry in the K/V to not list machines with wrong IPs after a restart.

yoshiokatsuneo · 2016-01-21T02:19:56Z

Here is my workaround after changing IP address of docker swarm node:

% docker-machine env docker-node
% docker-machine regenerate-certs docker-node
(I sometimes need to run multiple times when error occurs.)
% eval $(docker-machine env docker-node)
% export TOKEN=$(docker inspect -f "{{ index .Config.Cmd 3}}" swarm-agent)
% docker rm -f swarm-agent
% docker run -d --name=swarm-agent --restart=always swarm:latest join --advertise "${DOCKER_HOST##tcp://}" "${TOKEN}"

…ocker-daemon Allow user customisation before and after Docker daemon startup

rossbachp mentioned this issue Mar 18, 2015

certs: x509 check ip san #770

Merged

nathanleclaire added the area/swarm label Jul 6, 2015

abronan mentioned this issue Aug 3, 2015

Swarm panic on trying to create container through API docker-archive/classicswarm#1067

Closed

jeanlaurent added the driver/ec2 label Dec 23, 2015

jeanlaurent mentioned this issue Dec 23, 2015

After stop/restart ec2 instance certificates are invalid due to ip change #2668

Open

tomeon pushed a commit to tomeon/machine that referenced this issue May 9, 2018

Merge pull request docker#806 from SvenDowideit/run-bootsync-before-d…

92bf81f

…ocker-daemon Allow user customisation before and after Docker daemon startup

doctorpangloss mentioned this issue Jun 28, 2020

Changing even 1 swarm manager node IP address completely breaks Swarm moby/moby#41043

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After instance change ip, the swarm agent must also change the join addr. #806

After instance change ip, the swarm agent must also change the join addr. #806

rossbachp commented Mar 18, 2015

rossbachp commented Mar 18, 2015

nathanleclaire commented Jul 6, 2015

abronan commented Jul 14, 2015

yoshiokatsuneo commented Jan 21, 2016

After instance change ip, the swarm agent must also change the join addr. #806

After instance change ip, the swarm agent must also change the join addr. #806

Comments

rossbachp commented Mar 18, 2015

rossbachp commented Mar 18, 2015

nathanleclaire commented Jul 6, 2015

abronan commented Jul 14, 2015

yoshiokatsuneo commented Jan 21, 2016