Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaling: mark pending instances as failed after they do not join the swarm for too long #4880

Closed
Tracked by #950
sanderegg opened this issue Oct 18, 2023 · 0 comments · Fixed by #5832
Closed
Tracked by #950
Assignees
Labels
a:autoscaling autoscaling service in simcore's stack t:enhancement Improvement or request on an existing feature

Comments

@sanderegg
Copy link
Member

sanderegg commented Oct 18, 2023

In case an instance fails to join the swarm, it remains in the pending instances forever.

It would be desirable that:

  • after some EC2_MAX_TIME_PENDING the instance are moved to a "failed instances container"
  • once they are moved there, these instances shall be terminated
@sanderegg sanderegg self-assigned this Oct 18, 2023
@sanderegg sanderegg transferred this issue from ITISFoundation/osparc-issues Oct 18, 2023
@sanderegg sanderegg added the a:autoscaling autoscaling service in simcore's stack label Oct 18, 2023
@sanderegg sanderegg added this to the Schoggilebe milestone Feb 15, 2024
@sanderegg sanderegg added the t:enhancement Improvement or request on an existing feature label Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:autoscaling autoscaling service in simcore's stack t:enhancement Improvement or request on an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant