Autoscaling: provide a timeout before removing a docker node that is "down" #3684

sanderegg · 2022-12-15T07:37:00Z

    so this is an interesting thought. What might happen that the docker swarm reports a node as down while it is still running?

networking problem within AWS after a successful initial connection
the machine crashed/hanged/non responsive
something else?

At the moment, the code will just remove any down node from the docker swarm nodes list, to prevent it from growing. it will not terminate the instance (with the current code that is). Now this is maybe a bad idea, I could also wait some time before removing it (such as checking the last updated field on the node, and maybe let an hour go through or similar).

Doing such a thing could maybe solve 1., 2. if someone manages to restart the broken machine within the timeout period. But as of now, that is beyond the scope of that service I think. we can brainstorm what might come as additional features.

Originally posted by @sanderegg in #3655 (comment)

The text was updated successfully, but these errors were encountered:

sanderegg self-assigned this Dec 15, 2022

sanderegg added the t:enhancement Improvement or request on an existing feature label Dec 15, 2022

sanderegg mentioned this issue Dec 15, 2022

✨ Autoscaling: scale down nodes #3655

Merged

sanderegg mentioned this issue Feb 17, 2023

Autoscaling - Dynamic Services ITISFoundation/osparc-issues#657

Closed

sanderegg mentioned this issue Nov 7, 2023

✨Computational autoscaling: find out which EC2 type is necessary #4975

Merged

sanderegg closed this as completed in #4975 Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoscaling: provide a timeout before removing a docker node that is "down" #3684

Autoscaling: provide a timeout before removing a docker node that is "down" #3684

sanderegg commented Dec 15, 2022

Autoscaling: provide a timeout before removing a docker node that is "down" #3684

Autoscaling: provide a timeout before removing a docker node that is "down" #3684

Comments

sanderegg commented Dec 15, 2022