-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't initialize mp_context
on import
#6580
Don't initialize mp_context
on import
#6580
Conversation
Allows controlling the context type through dask.config programmatically, rather than needing to use environment variables, or carefully control import order.
Can one of the admins verify this patch? |
Thanks for your patch! Just skimming over the changes, this looks good to me. I am not sure if there was any deeper reason to having this setup at module import time. I guess having this cached should be sufficient. I noticed you're having issues with our code linting (pre-commit hooks). See https://docs.dask.org/en/stable/develop.html#code-formatting for some guidance on how to set this up
cc @jrbourbeau do you have any context about the |
e1270fc
to
34f1bf0
Compare
Sorry, I'd run |
add to allowlist |
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ±0 15 suites ±0 6h 20m 21s ⏱️ - 46m 44s For more details on these failures and errors, see this check. Results for commit 34f1bf0. ± Comparison against base commit cb88e3b. |
I had another look at it. It may be a good idea to refactor all of this but the two config options have a slightly different meaning. I'm not sure if the preload is worth it in dask. This optimization was mostly added for test runtime as noted in this comment and most dask tests should be using a threaded backend. Even distributed is by now using spawn as a default method (#3374) |
Thank you @wence- |
Thanks! |
Allows selection of the method multiprocessing uses to start child processes. Additionally, in the forkserver case, ensure the fork server is up and running before any computation happens. Potentially fixes #930. Needs dask/distributed#6580. cc: @pentschev, @quasiben Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #933
pre-commit run --all-files
The
fork
andspawn
methods offered by multiprocessing interact badly with many infiniband (or other high performance) interconnects which don't support fork after the relevant low-level networking library has been initialised (or else crash in strange ways). Multiprocessing'sforkserver
method offers a way round this. As long as the forkserver is started (viamultiprocessing.forkserver.ensure_running()
) before networking library initialisation then things work (the child process that does the forking doesn't need the network library).While one can control the method that distributed uses by setting
DASK_DISTRIBUTED__WORKER__MULTIPROCESSING_METHOD=forkserver
in the environment, it would be nice to be able to control this programmatically as well (using the"distributed.worker.multiprocessing-method"
key). This PR enables that by deferring the context creation to function call time (at which point the distributed config can be inspected, and might already have been modified) rather than module import time. Since multiprocessing contexts are singleton objects, this should not be a performance hit.Query: dask also provides a
get_context
method indask.multiprocessing
that respects a different config option ("multiprocessing.context"
). The only difference is that the distributed version of the function adds a bunch of modules to the preload list for the forkserver case. Should I refactor to use thedask.multiprocessing
version and handle preload there?