-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: fix node reservation scoring #7730
Conversation
f02c834
to
e86059b
Compare
The BinPackIter accounted for node reservations twice when scoring nodes which could bias scores toward nodes with reservations. Pseudo-code for previous algorithm: ``` proposed = reservedResources + sum(allocsResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are added to the total resources used by allocations, and then the node's reserved resources are later substracted from the node's overall resources. The new algorithm is: ``` proposed = sum(allocResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are no longer added to the total resources used by allocations. My guess as to how this bug happened is that the resource utilization variable (`util`) is calculated and returned by the `AllocsFit` function which needs to take reserved resources into account as a basic feasibility check. To avoid re-calculating alloc resource usage (because there may be a large number of allocs), we reused `util` in the `ScoreFit` function. `ScoreFit` properly accounts for reserved resources by subtracting them from the node's overall resources. However since `util` _also_ took reserved resources into account the score would be incorrect. Prior to the fix the added test output: ``` Node: reserved Score: 1.0000 Node: reserved2 Score: 1.0000 Node: no-reserved Score: 0.9741 ``` The scores being 1.0 for *both* nodes with reserved resources is a good hint something is wrong as they should receive different scores. Upon further inspection the double accounting of reserved resources caused their scores to be >1.0 and clamped. After the fix the added test outputs: ``` Node: no-reserved Score: 0.9741 Node: reserved Score: 0.9480 Node: reserved2 Score: 0.8717 ```
7fdbb8a
to
68aca51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code changes look good! Skimmed the tests
Co-authored-by: Alex Dadgar <[email protected]>
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
The BinPackIter accounted for node reservations twice when scoring nodes
which could bias scores toward nodes with reservations.
Pseudo-code for previous algorithm:
The node's reserved resources are added to the total resources used by
allocations, and then the node's reserved resources are later
substracted from the node's overall resources.
The new algorithm is:
The node's reserved resources are no longer added to the total resources
used by allocations.
My guess as to how this bug happened is that the resource utilization
variable (
util
) is calculated and returned by theAllocsFit
functionwhich needs to take reserved resources into account as a basic
feasibility check.
To avoid re-calculating alloc resource usage (because there may be a
large number of allocs), we reused
util
in theScoreFit
function.ScoreFit
properly accounts for reserved resources by subtracting themfrom the node's overall resources. However since
util
also tookreserved resources into account the score would be incorrect.
Prior to the fix the added test output:
The scores being 1.0 for both nodes with reserved resources is a good
hint something is wrong as they should receive different scores (and neither
is a perfect fit). Upon further inspection the double accounting of reserved
resources caused their scores to be >1.0 and clamped.
After the fix the added test outputs: