Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make scale in of htex_auto_scale more effective #2196

Merged
merged 6 commits into from
Jan 26, 2022

Conversation

jrueb
Copy link
Contributor

@jrueb jrueb commented Jan 20, 2022

Description

For htex_auto_scale instead of removing only 1 block at most in case there are more slots than tasks, remove try to remove many blocks so that the number of slots matches the number of tasks. This is of course still limited by how many blocks have reached the configured max_idletime.

Fixes #2195

Type of change

Choose which options apply, and delete the ones which do not apply.

  • Bug fix (non-breaking change that fixes an issue)

@@ -262,7 +262,10 @@ def _general_strategy(self, status_list, tasks, *, strategy_type):
logger.debug("More slots than tasks")
if isinstance(executor, HighThroughputExecutor):
if active_blocks > min_blocks:
exec_status.scale_in(1, force=False, max_idletime=self.max_idletime)
exec_status.scale_in(
active_blocks - active_tasks // tasks_per_node // nodes_per_block,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this calculation should look similar to the calculation around line 246: both of them arg calculating some kind of "target number of blocks" and it makes me uncomfortable that they don't look exactly the same (eg ceil, min and parallelism)

other than that, this looks like the right thing to be doing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this difference in calculation manifested as issue #3696

@jrueb
Copy link
Contributor Author

jrueb commented Jan 20, 2022

Adjusted the computation. Indeed I didn't take parallelism and min_blocks into the calculation.

@@ -262,7 +262,10 @@ def _general_strategy(self, status_list, tasks, *, strategy_type):
logger.debug("More slots than tasks")
if isinstance(executor, HighThroughputExecutor):
if active_blocks > min_blocks:
exec_status.scale_in(1, force=False, max_idletime=self.max_idletime)
excess = math.ceil(active_slots - (active_tasks * parallelism))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be clearer to rename the var excess to excess_slots.

Copy link
Contributor Author

@jrueb jrueb Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also want excess renamed above, in line 245? It was using this term before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that isn't a bad idea.

Copy link
Member

@yadudoc yadudoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrueb I think the logic here is correct. I have a minor recommendation about variable naming but otherwise, this PR is good to go. Thanks for writing this up! :)

@benclifford benclifford merged commit 5b056f7 into Parsl:master Jan 26, 2022
github-merge-queue bot pushed a commit that referenced this pull request Jan 6, 2025
# Description

PR #2196 calculates a number of blocks to scale in, in the htex
strategy, rather than scaling in one block per strategy iteration.
However, it rounds the wrong way: it scales in a rounded up, rather than
rounded down, number of blocks.

Issue #3696 shows that then resulting in oscillating behaviour: With 14
tasks and 48 workers per block, on alternating strategy runs, the code
will either scale up to the rounded up number of needed blocks (14/48 =>
1), or scale down to the rounded down number of needed blocks (14/48 =>
0).

This PR changes the rounding introduced in #2196 to be consistent:
rounding up the number of blocks to scale up, and rounding down the
number of blocks to scale down.

# Changed Behaviour

HTEX scale down should oscillate less

# Fixes

Fixes #3696 

## Type of change

- Bug fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Strategy htex_auto_scale removing blocks far too slow
3 participants