-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI Pool #38
MPI Pool #38
Conversation
Hi @joezuntz, First of all, thanks for this! A lot of people have been asking for something like this and this is a very satisfying solution! Sorry for taking a million years but I've finally started merging this. I've added your A few things:
Thanks again, Dan |
Cool - good to hear. I'm using emcee as part of the nascent Dark Energy Survey parameter estimation code, and it's been great, thanks.
Since I thing there is some kind of clash going on at some point here I managed to solve it by having the master wait for all tasks to be received before demanding results. This should not slow anything down too much, I think. See patch (couldn't face setting up a whole new fork+branch+pull request): |
I thought the pickle/unpickle infrastructure kept some caching so that sending OR, as an alternative to changing the pool, couldn't you just use a global --dstn On Fri, 30 Nov 2012, joezuntz wrote:
|
Thanks! That fix worked. I'll finish up the documentation and then send you a link so that you can find what I mess up. I'd love to know what the answer to @dstndstn's question is because (I think) my friend @jonathansick found that he could significantly speed up parallelization (using @dstndstn: I'm not sure if I quite understand your second comment. Can you explain a bit more what this would look like? Thanks! |
@dfm @dstndstn My "solution" to the function pickling overhead problem was to set up a bunch of work servers so that the objective function is setup/initialized only once per node. This is really great for objective functions that have a lot of data (say, a stellar pop synthesis code). The worker cluster is also persistent between emcee runs. My zmq package is at https://github.com/jonathansick/mapscale and there's half-decent documentation at http://mapscale.jonathansick.ca My branch of emcee ( https://github.com/jonathansick/emcee ) contains some mods to make the mapper object work as a replacement to Python/multiprocessing This said, MapScale is a completely alpha thing right now. No unit tests, no inter-machine networking yet, no real robustness against network hiccups/job loss. The mpi4py integration from @joezuntz is certainly the way to go right now if it can cache the objective function. I'll want to check it out with (python-wrapped) FSPS pop synth problems, since that's a case where there's a huge overhead in initializing the objective function that should only be done once. |
Thanks for your comments, @jonathansick. Perhaps it's worth taking this to a new thread... maybe when mapscale-emcee is ready for prime time. @joezuntz: I've pushed the documentation for your patch. Besides the fact the the source code link gets a 404 (this will be fixed when we push to master) I think that this looks about right. Thoughts? Thanks again for your contribution. |
I should have seen this before but it's pretty trivial to cache the applied function - you can just have to have the master check if the function has changed and only re-send to the workers if it has. That way you don't lose any generality but you get the efficiency of sending the function only once. patch here: The docs look good - not sure if it's worth mentioning that you need to close the pool at the end? (Actually you could put this in the master's destructor, though I think that would potentially be confusing). |
For problems where a single calculation of the likelihood is significantly slow you end up needing a distributed memory machine very quickly. This is also a really good way of using the parallelism that the emcee algorithm offers effectively.
I've added here two example files containing an MPI pool, so you can upscale the algorithm to hundreds of cores without changing the emcee core at all. I also have another variant which is a little more efficient, but is pretty misleading and less future-proof should you change how pools are used in emcee. (You can assume that the function being mapped never changes so you never have to send it).
The implementation uses the mpi4py package.