p_map() very slow compared to multiprocess.Pool.map() #40

FlorinAndrei · 2021-09-06T22:15:43Z

I'm trying to accelerate Pandas df.apply(), and also get a progress bar. The problem is, p_map is orders of magnitude slower than plain multiprocess.Pool.map() for a job where most of the processing is done by nltk.sentiment.vader.SentimentIntensityAnalyzer().

This notebook is self-explanatory:

https://github.com/FlorinAndrei/misc/blob/master/p_tqdm_bug_1.ipynb

p_map() is orders of magnitude slower.

However, the same function seems to work fine, fast enough, for another task - reading 25k files off the disk.

Windows 10, Python 3.8.8, Jupyter Notebook

The text was updated successfully, but these errors were encountered:

nuttyartist · 2023-01-01T17:37:42Z

From my testing, it seems like tqdm is the culprit. If I use tqdm on regular multiprocessing.Pool() it slows it down significantly. Did someone else experience this?

AeroTH310 · 2023-04-05T22:59:16Z

I have experienced this also. It appears that the pool is actually processing serially... I see many processes getting started in my system monitor according to the number of cores I set. Only one of these processes seem to be doing anything at any time though.

BenjaminHoegh · 2023-09-05T11:15:25Z

Also seems to very slow compared to joblib's parallel

nphilou · 2024-11-13T14:52:46Z

Same issue here so far but it seems that the issue is more on the pathos side.

Tests here: https://gist.github.com/nphilou/1296ebabc7b4de24f57de9452e25405e

Further testing on my side showed that pathos parallelization is very sensible to the number of operations.

If my workload consists of numerous short tasks, using pathos may lead to significantly slow performance. However, if I do a smaller number of computationally intensive tasks, the execution time with pathos might be comparable to using multiprocessing.Pool.

This is very high-level observations, I didn't dive further on the low-level aspect

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p_map() very slow compared to multiprocess.Pool.map() #40

p_map() very slow compared to multiprocess.Pool.map() #40

FlorinAndrei commented Sep 6, 2021

nuttyartist commented Jan 1, 2023

AeroTH310 commented Apr 5, 2023

BenjaminHoegh commented Sep 5, 2023

nphilou commented Nov 13, 2024

p_map() very slow compared to multiprocess.Pool.map() #40

p_map() very slow compared to multiprocess.Pool.map() #40

Comments

FlorinAndrei commented Sep 6, 2021

nuttyartist commented Jan 1, 2023

AeroTH310 commented Apr 5, 2023

BenjaminHoegh commented Sep 5, 2023

nphilou commented Nov 13, 2024