Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device memory spill support (LRU-based only) #51

Merged
merged 16 commits into from
May 15, 2019

Conversation

pentschev
Copy link
Member

@pentschev pentschev commented May 14, 2019

Based on #35, but provides only the simple LRU mechanism (i.e., no device memory monitoring).

@pentschev pentschev mentioned this pull request May 14, 2019
4 tasks
@pentschev
Copy link
Member Author

@mrocklin let's move the conversation here.

Right, mostly I want to say lets not add a new device_memory_foo= keyword yet if we can avoid it.

We need to handle it somehow. We have to pass that information to DeviceHostFile somehow, and we can create the object before passing to Worker/Nanny (which is fine, IMO). But the user can already pass --nthreads to dask-cuda-worker, or n_workers to LocalCUDACluster, so we should at least handle that appropriately. I'll see how this can be handled in a minimalistic way, probably by just dividing memory evenly.

@mrocklin
Copy link
Contributor

Under the assumption that we'll have one worker per GPU I think that we probably don't need to divide at all. We just take the maximum available device memory of the current GPU and use that.

@pentschev
Copy link
Member Author

I know you @mrocklin said not to add --device-memory-limit, but this allows us to create the DeviceHostFile object without querying GPUs for memory. This allows the CUDA_VISIBLE_DEVICES test to work properly.

Copy link
Contributor

@mrocklin mrocklin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this @pentschev I've left a few minor comments, but I suspect that we can probably agree on things quickly and merge this soon.

xx = x.persist()
yield wait(xx)

print(worker.data.device_buffer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to remove the print statements for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was accidental.

memory_limit=parse_memory_limit(
memory_limit, nthreads, total_cores=nprocs
),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend the following instead:

data=(DeviceHostFile, {...})

Otherwise we create the data object immediately here, and then need to pass it down to the worker through a process boundary.

This is implemented here, but it looks like it's not documented anywhere (my mistake)

https://github.com/dask/distributed/blob/8e449d392e91eff0a3454ee98ef362de8f78cc4f/distributed/worker.py#L500-L501

Also, if the user specifies device_memory_limit=0 then we might want something simpler. I can imagine wanting to turn off this behavior if things get complex.

We probably also want the same treatment in LocalCUDACluster, though as a first pass we can also include this ourselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was in the process of doing that, only saw your comment now. Indeed, this is a better solution, but either way, the downside is that I had to create the work directory in dask-cuda-worker, which I'm not particularly happy with, so if you have a suggestion on how to avoid that, it would be great!

I'm already working on the LocalCUDACluster, I'm still not done, will soon push those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the downside is that I had to create the work directory in dask-cuda-worker, which I'm not particularly happy with, so if you have a suggestion on how to avoid that, it would be great!

Hrm, yes I can see how that would be a concern. I don't have a good solution currently unfortunately. I'll think about it though.

@pentschev
Copy link
Member Author

@mrocklin I think this is good now, tests in place too. IMO, the only thing to be changed is how to create the worker directory if we can think of a way, but besides that, it's probably good for a review.

@mrocklin
Copy link
Contributor

Looks good to me. Thank you for putting in the effort here @pentschev . I'm looking forward to using this :)

@mrocklin mrocklin merged commit 6c4004b into rapidsai:branch-0.8 May 15, 2019
@pentschev
Copy link
Member Author

Thanks for the review @mrocklin!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants