Non-HA server setups should only run one server container at a time #6

bdentino · 2016-09-15T18:59:50Z

When multiple servers are running in a non-HA setup connected to the same db, it causes hosts to start going in and out of the reconnecting state. I haven't looked into why yet, but my immediate hypothesis is that the secondary servers are waiting to receive pings from these agents. But agents use the load-balanced hostname, which always directs to the primary instance. So it never receives ping responses and thinks these agents are unavailable so it starts marking them as reconnecting, but the primary is receiving pings so it keeps marking them back in a not-so-beautiful dance.

Simplest solution is probably to add a service to the server image that starts/stops the rancher server container depending on whether or not it's the primary. This effectively means that failover might take a bit longer, depending on the polling interval for serf changes (though it could also be triggered with serf events), and how long it takes the rancher container to initialize. In any case, I can't see it taking more than 30s. If you only have one instance running and it goes down, you basically need to wait for the cluster to detect this for and AWS to start up a new instance, which can take 10-20m depending on the circumstances. If you can tolerate this downtime, then there's no reason to run multiple instances otherwise it's a good idea to run a secondary.

So in the meantime, we're limited to one-server setups which requires us to accept that extended downtime in the case of a node failure or termination.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-HA server setups should only run one server container at a time #6

Non-HA server setups should only run one server container at a time #6

bdentino commented Sep 15, 2016

Non-HA server setups should only run one server container at a time #6

Non-HA server setups should only run one server container at a time #6

Comments

bdentino commented Sep 15, 2016