Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAS: optimize the algorithm to minimize fragmentation #3756

Closed
3 tasks
mimowo opened this issue Dec 6, 2024 · 1 comment · Fixed by #4228
Closed
3 tasks

TAS: optimize the algorithm to minimize fragmentation #3756

mimowo opened this issue Dec 6, 2024 · 1 comment · Fixed by #4228
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@mimowo
Copy link
Contributor

mimowo commented Dec 6, 2024

What would you like to be added:

In some cases there are low-hanging fruit optimizations to the algorithm. For example, if the workload requires 2GPU, and there are two nodes allowing to fit the workload, we currently choose the one with more space, say 4GPUs, which leaves us with 2 nodes each having 2 GPUs free - the capacity gets fragmented. Similar heuristics are possible for cases with 2 nodes, but probably it is a hard problem in general.

We may need to decide if we just go with the low-hanging fruit heuristics or we have some API which allows to control fragmentation vs. complexity of the scheduling algorithm.

Why is this needed:

The current algorithm leads to create unnecessary fragmentation of the capacity, as indicated in the simple example above.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@mimowo mimowo added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 6, 2024
@PBundyra
Copy link
Contributor

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants