Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaling: provide a timeout before removing a docker node that is "down" #3684

Closed
sanderegg opened this issue Dec 15, 2022 · 0 comments · Fixed by #4975
Closed

Autoscaling: provide a timeout before removing a docker node that is "down" #3684

sanderegg opened this issue Dec 15, 2022 · 0 comments · Fixed by #4975
Assignees
Labels
t:enhancement Improvement or request on an existing feature

Comments

@sanderegg
Copy link
Member

    so this is an interesting thought. What might happen that the docker swarm reports a node as down while it is still running?
  1. networking problem within AWS after a successful initial connection
  2. the machine crashed/hanged/non responsive
  3. something else?

At the moment, the code will just remove any down node from the docker swarm nodes list, to prevent it from growing. it will not terminate the instance (with the current code that is). Now this is maybe a bad idea, I could also wait some time before removing it (such as checking the last updated field on the node, and maybe let an hour go through or similar).

Doing such a thing could maybe solve 1., 2. if someone manages to restart the broken machine within the timeout period. But as of now, that is beyond the scope of that service I think. we can brainstorm what might come as additional features.

Originally posted by @sanderegg in #3655 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t:enhancement Improvement or request on an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant