-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not rebalance spilled keys #6002
Comments
Having this information on the scheduler would be generally helpful, beyond this issue. For example, it could let How would this interact with #5900? Would we want more detail than |
I suspect that starting to account for the cost of serialization scheduler-side would be a massive and ultimately unjustified complication. Unspilling is order-of-magnitude more expensive than just deserialising a key that is already in memory. So I'm moderately inclined to completely ignore any memory management cost that is not disk or network I/O. |
I think that's entirely sensible as the initial step. |
Soft-blocked by #4906 (we could implement it for the current rebalance, but there would be a lot of throwaway work). |
Having scheduler-side data about individual spilled keys would also help resolve this issue described in
|
Scheduler.rebalance() moves key/value pairs from workers with the heaviest memory load to those with the lightest.
"memory load" is, by default, measured in terms of optimistic memory: managed (not spilled) + unmanaged older than 30s, which is the same as process - managed recent.
(read: https://distributed.dask.org/en/latest/worker-memory.html#using-the-dashboard-to-monitor-memory-usage)
rebalance uses a least-recently-inserted algorithm to pick which keys to move - the rationale being that implementing a least-recently-used algorithm like on the worker-side spill system would require an unreasonable amount of extra synchronization from worker to scheduler.
In a somewhat common scenario where least-recently-inserted crudely matches least-recently-used, this means that rebalance will tend to move spilled keys first. The consequence is that rebalance not only will not improve in any way the situation on the sender worker, but it will actually make its memory usage spike for the whole duration of the transfer, as it copies the data out of the spill disk and into memory to prepare it for send (xref: #5996 for mitigation).
It will also aggravate memory usage on the receiving worker. If the worker later reaches the target threshold itself, then it will have a key from rebalance that used to be spilled (read: it was not used for a long time) that will be spilled potentially later than more recent keys, as the SpillBuffer makes no distinction between key/values used by rebalance/AMM and those used by the computation.
Proposed design
Sync a spilled flag from worker to scheduler.
Completely ignore spilled keys in rebalance().
Both issue and design are valid both in the current (legacy) implementation of rebalance() as well as in the future reimplementation on top of AMM.
The text was updated successfully, but these errors were encountered: