🐛Autoscaling/Comp backend: drain retired nodes so that they can be re-used #6345

sanderegg · 2024-09-10T16:22:53Z

What do these changes do?

Autoscaled Cluster was expanded to also keep retired_nodes.
These nodes were retired via dask scheduler mechanism, that removes idle workers and transfer their memory to other workers based on some internal mechanisms.

These "retired" workers are now properly recognized, and are forcefully drained. This way they can be:

re-used, by undraining and restarting a dask-sidecar in case a new arrives in between,
or terminated if there is nothing else to do

Related issue/s

fixes Computational cluster: retired worker not terminated #6319

How to test

Dev-ops checklist

No ENV changes or I properly updated ENV (read the instruction)

sonarqubecloud · 2024-09-10T16:42:17Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

codecov · 2024-09-10T17:17:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.5%. Comparing base (cafbf96) to head (18cce86).
Report is 523 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (cafbf96) and HEAD (18cce86). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (cafbf96) HEAD (18cce86)

unittests 1 0

Additional details and impacted files

@@            Coverage Diff            @@
##           master   #6345      +/-   ##
=========================================
- Coverage    84.5%   64.5%   -20.1%     
=========================================
  Files          10     591     +581     
  Lines         214   30224   +30010     
  Branches       25     260     +235     
=========================================
+ Hits          181   19511   +19330     
- Misses         23   10653   +10630     
- Partials       10      60      +50

Flag	Coverage Δ
integrationtests	`64.5% <ø> (?)`
unittests	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

see 601 files with indirect coverage changes

matusdrobuliak66

👍

…-used (#6345)

sanderegg added 3 commits September 10, 2024 18:13

testing

33c4e26

added retired nodes

bc87e67

do it

0705f29

sanderegg added a:autoscaling autoscaling service in simcore's stack a:clusters-keeper labels Sep 10, 2024

sanderegg added this to the Eisbock milestone Sep 10, 2024

sanderegg self-assigned this Sep 10, 2024

sanderegg added 4 commits September 10, 2024 18:26

missing file

1ec58ae

make sure an active node has a running dask-sidecar

380481f

minor

4955e17

monitoring script

18cce86

sanderegg marked this pull request as ready for review September 10, 2024 16:43

sanderegg requested review from pcrespov, GitHK, mguidon, matusdrobuliak66 and giancarloromeo and removed request for pcrespov September 10, 2024 16:43

matusdrobuliak66 approved these changes Sep 10, 2024

View reviewed changes

sanderegg merged commit 5b35cfe into ITISFoundation:master Sep 10, 2024
55 of 57 checks passed

This was referenced Sep 11, 2024

🚀 Pre-release master -> staging_Eisbock4 #6302

Closed

🚀 Release v1.77.0 / v1.77.1 / v1.77.2 #6212

Closed

sanderegg added a commit that referenced this pull request Sep 12, 2024

🐛Autoscaling/Comp backend: drain retired nodes so that they can be re…

07c200e

…-used (#6345)

This was referenced Sep 12, 2024

🚀 Release v1.76.0 #6046

Closed

Autoscaling: move retire_nodes call into the call to deactivate empty nodes #5867

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛Autoscaling/Comp backend: drain retired nodes so that they can be re-used #6345

🐛Autoscaling/Comp backend: drain retired nodes so that they can be re-used #6345

sanderegg commented Sep 10, 2024 •

edited

Loading

sonarqubecloud bot commented Sep 10, 2024 •

edited

Loading

codecov bot commented Sep 10, 2024 •

edited

Loading

matusdrobuliak66 left a comment

🐛Autoscaling/Comp backend: drain retired nodes so that they can be re-used #6345

🐛Autoscaling/Comp backend: drain retired nodes so that they can be re-used #6345

Conversation

sanderegg commented Sep 10, 2024 • edited Loading

What do these changes do?

Related issue/s

How to test

Dev-ops checklist

sonarqubecloud bot commented Sep 10, 2024 • edited Loading

Quality Gate passed

codecov bot commented Sep 10, 2024 • edited Loading

Codecov Report

matusdrobuliak66 left a comment

Choose a reason for hiding this comment

sanderegg commented Sep 10, 2024 •

edited

Loading

sonarqubecloud bot commented Sep 10, 2024 •

edited

Loading

codecov bot commented Sep 10, 2024 •

edited

Loading