-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support RMM arguments for cluster initialization #6
Comments
@paulhendricks fyi |
In principle this is the sort of thing that a project like LocalCUDACluster could handle, however I'm somewhat concerned that it seems very RAPIDS specific, and also seems perhaps to be pretty unstable. Someone using this project for PyTorch might not appreciate this change. Also my guess looking at the code is that this would change in the next month or would change based on what libraries someone wanted to use. My hope is that LocalCUDACluster would be general beyond just RAPIDS work and would avoid baking in code that was tailored for a specific workflow. Instead, I think that we might finish up dask/distributed#2453 and use that, or perhaps use worker preload scripts (see http://distributed.dask.org/en/latest/setup.html#customizing-initialization). Short term the preload scripts are probably the easiest approach. You would add that function to a small script, and then call something like You can also avoid having to specify this keyword argument by putting this script into your config at distributed:
worker:
preload: /path/to/myscript.py |
Understood about the goal of making this a general utility, and the startup scripts will work. However, it's worth noting that (while its adoption is still mostly in RAPIDS projects) RMM has a goal of making shared memory pool management easier across the GPU ecosystem. Might be worth revisiting later when it gets adopted more widely. @harrism fyi |
I think that the preload script solution is probably the right level of customization for this problem. We can make a script with your startup commands, put that into configuration, and it will be run on any dask worker that people setup. That config file and script will be able to adapt much more nimbly than the dask-cuda project. |
@kkraus14 what do you think is the right way medium-term to solve the RMM initialization issue on the Python usability side? |
+1 |
See how this example notebook is setting up RMM using client.run.
Is this something we can have LocalCudaCluster handle during startup?
The text was updated successfully, but these errors were encountered: