Use MapScale for embarrassingly parallel python pipelines.
MapScale uses the ZeroMQ messaging framework, allows jobs to be distributed across a network, and allows persistent job servers to reduce setup latency.
The use pattern is similar to multiprocessing
's Pool.map().
MapScale lives at https://github.com/jonathansick/mapscale
On the web: http://mapscale.jonathansick.ca.
You can also get PDF and EPUB versions. Thank you Read the Docs!
To build the documentation on your computer:
cd *mapscale repo*
python setup.py docs
and documentation will be built in build/sphinx/html
.
Fork the MapScale repo and send pull requests!
- Add workers of the network, using TCP sockets in SSH tunnels. Currently, only local workers are supported.
- Make sure that workers have completed their startup sequence before we give them jobs. This will require an update to the socket architecture.
- Make MapScale robust to lost jobs and worker dropouts. This will be particularly useful once we add workers over the Internet.
- Documentation, we need more of that.