Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛Autoscaling/Comp backend: drain retired nodes so that they can be re-used #6345

Conversation

sanderegg
Copy link
Member

@sanderegg sanderegg commented Sep 10, 2024

What do these changes do?

Autoscaled Cluster was expanded to also keep retired_nodes.
These nodes were retired via dask scheduler mechanism, that removes idle workers and transfer their memory to other workers based on some internal mechanisms.

These "retired" workers are now properly recognized, and are forcefully drained. This way they can be:

  • re-used, by undraining and restarting a dask-sidecar in case a new arrives in between,
  • or terminated if there is nothing else to do

Related issue/s

How to test

Dev-ops checklist

@sanderegg sanderegg added a:autoscaling autoscaling service in simcore's stack a:clusters-keeper labels Sep 10, 2024
@sanderegg sanderegg added this to the Eisbock milestone Sep 10, 2024
@sanderegg sanderegg self-assigned this Sep 10, 2024
Copy link

sonarqubecloud bot commented Sep 10, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@sanderegg sanderegg marked this pull request as ready for review September 10, 2024 16:43
@sanderegg sanderegg requested review from pcrespov, GitHK, mguidon, matusdrobuliak66 and giancarloromeo and removed request for pcrespov September 10, 2024 16:43
Copy link

codecov bot commented Sep 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.5%. Comparing base (cafbf96) to head (18cce86).
Report is 523 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (cafbf96) and HEAD (18cce86). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (cafbf96) HEAD (18cce86)
unittests 1 0
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #6345      +/-   ##
=========================================
- Coverage    84.5%   64.5%   -20.1%     
=========================================
  Files          10     591     +581     
  Lines         214   30224   +30010     
  Branches       25     260     +235     
=========================================
+ Hits          181   19511   +19330     
- Misses         23   10653   +10630     
- Partials       10      60      +50     
Flag Coverage Δ
integrationtests 64.5% <ø> (?)
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

see 601 files with indirect coverage changes

Copy link
Contributor

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@sanderegg sanderegg merged commit 5b35cfe into ITISFoundation:master Sep 10, 2024
55 of 57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:autoscaling autoscaling service in simcore's stack a:clusters-keeper
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Computational cluster: retired worker not terminated
2 participants