-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Device memory spill support #35
Closed
Closed
Changes from all commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
1aa68a8
Add support for yaml configurations
pentschev 7349974
Add DeviceHostFile class to handle memory-spilling in LRU fashion
pentschev 377368a
Add CUDAWorker class to handle device memory
pentschev 197e653
Use CUDAWorker by default, add --device-memory-limit parameter
pentschev de5c3e9
Rename configurations dask-cuda -> cuda
pentschev 42c0686
Use CUDAWorker on LocalCUDACluster
pentschev c92bbc9
Add scheduler_ip argument to CUDAWorker
pentschev ce4ba37
Add test_device_spill
pentschev b3cbf2c
Add assert_device_host_file_size in utils_test module
pentschev 855876c
Temporarily build with dask-distributed master
pentschev 07fccfa
Install CuPy for CI builds
pentschev 6d48805
Fix DeviceHostFile transfer between host/device dicts
pentschev 1dd2c16
Add DeviceHostFile test
pentschev df6cb95
Update numpy and numba requirements
pentschev 5a7ceef
Use Dask master and enable __array_function__ for CI builds
pentschev 8659fb6
Fixes for GPU CI
pentschev 2c24f03
Fix test_spill, extend it to multiple parameters
pentschev 1d06c75
Minor device_host_file and utils_test fixes
pentschev deff18f
Add CUDANanny, wraps Nanny but defaults to CUDAWorker
pentschev 2d9c150
Fix GPU indices for test_cuda_visible_devices
pentschev ce5c650
Add single-GPU test for visible devices
pentschev c5fbb6f
Improve documentation for CUDANanny, DeviceHostFile
pentschev dcc6a6a
Fix defaults for device_memory_limit
pentschev 9bc21e7
Fix test_with_subset_of_cuda_visible_devices
pentschev 9eb2dfd
Pass *args to CUDANanny and CUDAWorker
pentschev 4324439
Increase required dask, distributed versions
pentschev 388c677
Use yaml.safe_load
mrocklin c942438
Fix CUDA worker not pausing issue, print correct resume message
pentschev d968b0f
Support protocol= keyword in LocalCUDACluster (#39)
mrocklin 1abb1eb
Merge branch 'branch-0.7' into device-memory-spill
mrocklin eb70191
Merge branch 'device-memory-spill' of github.com:pentschev/dask-cuda …
mrocklin 806ac8b
Minor style changes
mrocklin f35826e
Change config namespace from cuda to distributed
mrocklin f308943
Rename files to remove cuda suffix
mrocklin 1b9d38b
Fix style
pentschev 6d5b714
Add device, host, disk LRU aliases to simplify user queries
pentschev 358f194
Default pausing due to device memory usage to 0.0 (disable)
pentschev c81f763
Small doc fix in dask_cuda_worker.py
mrocklin f28cbd1
Avoid modifying DeviceHostFile internals in test_device_host_file
pentschev 6163359
Merge remote-tracking branch 'origin/device-memory-spill' into device…
pentschev a3c89fb
Allow choosing worker-class when starting dask-cuda-worker
pentschev 8f59141
Add back original test_cuda_visible_devices, using Worker class
pentschev 1e3acce
Aliasing device, host and disk to dict instead of LRU
pentschev 882bf31
Remove utils_test, use simple naming instead of gen_random_key()
pentschev 1960b79
Use tmp_path instead of tempfile, fix flake8/black issues
pentschev eb86d5d
Add back 'loop' to test_cuda_visible_devices, despite flake8 complaints
pentschev 8153b27
Add missing pause argument to test_spill
pentschev File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
from .local_cuda_cluster import LocalCUDACluster | ||
from . import config |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
import yaml | ||
import os | ||
|
||
import dask | ||
|
||
config = dask.config.config | ||
|
||
|
||
fn = os.path.join(os.path.dirname(__file__), "cuda.yaml") | ||
dask.config.ensure_file(source=fn) | ||
with open(fn) as f: | ||
dask_cuda_defaults = yaml.safe_load(f) | ||
|
||
dask.config.update_defaults(dask_cuda_defaults) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
distributed: | ||
worker: | ||
# Fractions of device memory at which we take action to avoid memory blowup | ||
# Set any of the lower three values to False to turn off the behavior entirely | ||
device-memory: | ||
target: 0.60 # target fraction to stay below | ||
spill: 0.70 # fraction at which we spill to host | ||
pause: 0.0 # fraction at which we pause worker threads - Disabled (0.0) by default, preventing worker from stalling when third-party memory managers are used and can't be disabled |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
from zict import Buffer, File, Func | ||
from zict.common import ZictBase | ||
from distributed.protocol import deserialize_bytes, serialize_bytes | ||
from distributed.worker import weight | ||
|
||
from functools import partial | ||
import os | ||
|
||
|
||
def _is_device_object(obj): | ||
""" | ||
Check if obj is a device object, by checking if it has a | ||
__cuda_array_interface__ attributed | ||
""" | ||
return hasattr(obj, "__cuda_array_interface__") | ||
|
||
|
||
def _serialize_if_device(obj): | ||
""" Serialize an object if it's a device object """ | ||
if _is_device_object(obj): | ||
return serialize_bytes(obj, on_error="raise") | ||
else: | ||
return obj | ||
|
||
|
||
def _deserialize_if_device(obj): | ||
""" Deserialize an object if it's an instance of bytes """ | ||
if isinstance(obj, bytes): | ||
return deserialize_bytes(obj) | ||
else: | ||
return obj | ||
|
||
|
||
class DeviceHostFile(ZictBase): | ||
pentschev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" Manages serialization/deserialization of objects. | ||
|
||
Three LRU cache levels are controlled, for device, host and disk. | ||
Each level takes care of serializing objects once its limit has been | ||
reached and pass it to the subsequent level. Similarly, each cache | ||
may deserialize the object, but storing it back in the appropriate | ||
cache, depending on the type of object being deserialized. | ||
|
||
Parameters | ||
---------- | ||
device_memory_limit: int | ||
Number of bytes of CUDA device memory for device LRU cache, | ||
spills to host cache once filled. | ||
memory_limit: int | ||
Number of bytes of host memory for host LRU cache, spills to | ||
disk once filled. | ||
local_dir: path | ||
Path where to store serialized objects on disk | ||
""" | ||
|
||
def __init__( | ||
self, device_memory_limit=None, memory_limit=None, local_dir="dask-worker-space" | ||
): | ||
path = os.path.join(local_dir, "storage") | ||
|
||
self.host_func = dict() | ||
self.disk_func = Func( | ||
partial(serialize_bytes, on_error="raise"), deserialize_bytes, File(path) | ||
) | ||
self.host_buffer = Buffer( | ||
self.host_func, self.disk_func, memory_limit, weight=weight | ||
) | ||
|
||
self.device_func = dict() | ||
self.device_host_func = Func( | ||
_serialize_if_device, _deserialize_if_device, self.host_buffer | ||
) | ||
self.device_buffer = Buffer( | ||
self.device_func, self.device_host_func, device_memory_limit, weight=weight | ||
) | ||
|
||
self.device = self.device_buffer.fast.d | ||
self.host = self.host_buffer.fast.d | ||
self.disk = self.host_buffer.slow.d | ||
|
||
def __setitem__(self, key, value): | ||
if _is_device_object(value): | ||
self.device_buffer[key] = value | ||
else: | ||
self.host_buffer[key] = value | ||
|
||
def __getitem__(self, key): | ||
if key in self.host_buffer: | ||
obj = self.host_buffer[key] | ||
del self.host_buffer[key] | ||
self.device_buffer[key] = _deserialize_if_device(obj) | ||
|
||
if key in self.device_buffer: | ||
return self.device_buffer[key] | ||
else: | ||
raise KeyError | ||
|
||
def __len__(self): | ||
return len(self.device_buffer) | ||
|
||
def __iter__(self): | ||
return iter(self.device_buffer) | ||
|
||
def __delitem__(self, i): | ||
del self.device_buffer[i] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
from distributed import Nanny | ||
|
||
from .worker import CUDAWorker | ||
|
||
|
||
class CUDANanny(Nanny): | ||
""" A process to manage CUDAWorker processes | ||
|
||
This is a subclass of Nanny, with the only difference | ||
being worker_class=CUDAWorker. | ||
""" | ||
|
||
def __init__(self, *args, worker_class=CUDAWorker, **kwargs): | ||
Nanny.__init__(self, *args, worker_class=worker_class, **kwargs) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to avoid this until someone asks for it if possible. I'm somewhat against an API like this because it forces us to enumerate the possible options in code. If we were to do something like this I think that we would probably provide the full namespace of the class and then try to import it.
However there is enough uncertainty here that, for maintenance reasons, I'd prefer that we not promise anything until someone is asking us for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, the
worker_class
keyword exists because different groups have their own custom Worker classes that they use. Exposing a keyword like this but making no move to allow them to plug in seems against the spirit of the keyword.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not true. We solely disabled pausing the worker, to control spilling from the device, we still need to monitor the device memory. And this is why I needed to subclass it.
Unfortunately, this is necessary for us to reenable the old
CUDA_VISIBLE_DEVICES
test. UsingCUDAWorker
, which was hardcoded before, prevents us from launching a process on a single-GPU machine to mock test as if it had several and check for the ordering of GPUs of each process.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to monitor device memory externally from the use of
sizeof
? If objects' reliably tell us how much device memory they take up then we may not need to track device memory separately.Dask workers operated this way for a long time before we finally needed to give in and have a separate periodic callback that tracked system memory. I'm inclined to try the simpler approach first and see if it breaks down or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, they do track how much memory they take. However, tracking the device memory lets us decide when it's time to spill memory. Isn't that what
memory_target_fraction
andmemory_spill_fraction
(prependdevice_
for the cases in this PR) are for?The block https://github.com/rapidsai/dask-cuda/pull/35/files#diff-a77f0c6f19d8d34d59aede5e31455719R282 controls the spilling, and this is why we needed to subclass
Worker
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, something like that diff. You'll also want to add the
data=
keyword toLocalCUDACluster
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation on memory monitor.
Yes, and I also want to create the object before. :)
But ok, I can probably have it quickly done by tomorrow. There's a few more things that need to be ported to allow it to work (like finding out how much memory the device has in total), and also some test(s), which shouldn't be too difficult now that there's already one that works with the monitoring mechanism and I can base it on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually valid to pass just the class. Dask will construct it. I think that this is explained in the
Worker
docstring. This is better because you're using a Nanny and don't want to pass it through a process boundary.I recommend that we start with just using the full memory or a config value by default and not mess with any user inputs (which will get messy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll check that.
We need to identify how much memory there is available for the device, regardless. I can probably use the same numba code from before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, mostly I want to say lets not add a new
device_memory_foo=
keyword yet if we can avoid it.