You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
so this is an interesting thought. What might happen that the docker swarm reports a node as down while it is still running?
networking problem within AWS after a successful initial connection
the machine crashed/hanged/non responsive
something else?
At the moment, the code will just remove any down node from the docker swarm nodes list, to prevent it from growing. it will not terminate the instance (with the current code that is). Now this is maybe a bad idea, I could also wait some time before removing it (such as checking the last updated field on the node, and maybe let an hour go through or similar).
Doing such a thing could maybe solve 1., 2. if someone manages to restart the broken machine within the timeout period. But as of now, that is beyond the scope of that service I think. we can brainstorm what might come as additional features.
At the moment, the code will just remove any down node from the docker swarm nodes list, to prevent it from growing. it will not terminate the instance (with the current code that is). Now this is maybe a bad idea, I could also wait some time before removing it (such as checking the last updated field on the node, and maybe let an hour go through or similar).
Doing such a thing could maybe solve 1., 2. if someone manages to restart the broken machine within the timeout period. But as of now, that is beyond the scope of that service I think. we can brainstorm what might come as additional features.
Originally posted by @sanderegg in #3655 (comment)
The text was updated successfully, but these errors were encountered: