Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p_map() very slow compared to multiprocess.Pool.map() #40

Open
FlorinAndrei opened this issue Sep 6, 2021 · 4 comments
Open

p_map() very slow compared to multiprocess.Pool.map() #40

FlorinAndrei opened this issue Sep 6, 2021 · 4 comments

Comments

@FlorinAndrei
Copy link

I'm trying to accelerate Pandas df.apply(), and also get a progress bar. The problem is, p_map is orders of magnitude slower than plain multiprocess.Pool.map() for a job where most of the processing is done by nltk.sentiment.vader.SentimentIntensityAnalyzer().

This notebook is self-explanatory:

https://github.com/FlorinAndrei/misc/blob/master/p_tqdm_bug_1.ipynb

p_map() is orders of magnitude slower.

However, the same function seems to work fine, fast enough, for another task - reading 25k files off the disk.

Windows 10, Python 3.8.8, Jupyter Notebook

@nuttyartist
Copy link

From my testing, it seems like tqdm is the culprit. If I use tqdm on regular multiprocessing.Pool() it slows it down significantly. Did someone else experience this?

@AeroTH310
Copy link

I have experienced this also. It appears that the pool is actually processing serially... I see many processes getting started in my system monitor according to the number of cores I set. Only one of these processes seem to be doing anything at any time though.

@BenjaminHoegh
Copy link

Also seems to very slow compared to joblib's parallel

@nphilou
Copy link

nphilou commented Nov 13, 2024

Same issue here so far but it seems that the issue is more on the pathos side.

Tests here: https://gist.github.com/nphilou/1296ebabc7b4de24f57de9452e25405e

Further testing on my side showed that pathos parallelization is very sensible to the number of operations.

If my workload consists of numerous short tasks, using pathos may lead to significantly slow performance. However, if I do a smaller number of computationally intensive tasks, the execution time with pathos might be comparable to using multiprocessing.Pool.

This is very high-level observations, I didn't dive further on the low-level aspect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants