You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In some cases there are low-hanging fruit optimizations to the algorithm. For example, if the workload requires 2GPU, and there are two nodes allowing to fit the workload, we currently choose the one with more space, say 4GPUs, which leaves us with 2 nodes each having 2 GPUs free - the capacity gets fragmented. Similar heuristics are possible for cases with 2 nodes, but probably it is a hard problem in general.
We may need to decide if we just go with the low-hanging fruit heuristics or we have some API which allows to control fragmentation vs. complexity of the scheduling algorithm.
Why is this needed:
The current algorithm leads to create unnecessary fragmentation of the capacity, as indicated in the simple example above.
Completion requirements:
This enhancement requires the following artifacts:
Design doc
API change
Docs update
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered:
What would you like to be added:
In some cases there are low-hanging fruit optimizations to the algorithm. For example, if the workload requires 2GPU, and there are two nodes allowing to fit the workload, we currently choose the one with more space, say 4GPUs, which leaves us with 2 nodes each having 2 GPUs free - the capacity gets fragmented. Similar heuristics are possible for cases with 2 nodes, but probably it is a hard problem in general.
We may need to decide if we just go with the low-hanging fruit heuristics or we have some API which allows to control fragmentation vs. complexity of the scheduling algorithm.
Why is this needed:
The current algorithm leads to create unnecessary fragmentation of the capacity, as indicated in the simple example above.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: