-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p_map() very slow compared to multiprocess.Pool.map() #40
Comments
From my testing, it seems like tqdm is the culprit. If I use tqdm on regular multiprocessing.Pool() it slows it down significantly. Did someone else experience this? |
I have experienced this also. It appears that the pool is actually processing serially... I see many processes getting started in my system monitor according to the number of cores I set. Only one of these processes seem to be doing anything at any time though. |
Also seems to very slow compared to joblib's parallel |
Same issue here so far but it seems that the issue is more on the Tests here: https://gist.github.com/nphilou/1296ebabc7b4de24f57de9452e25405e Further testing on my side showed that pathos parallelization is very sensible to the number of operations. If my workload consists of numerous short tasks, using This is very high-level observations, I didn't dive further on the low-level aspect |
I'm trying to accelerate Pandas
df.apply()
, and also get a progress bar. The problem is,p_map
is orders of magnitude slower than plainmultiprocess.Pool.map()
for a job where most of the processing is done bynltk.sentiment.vader.SentimentIntensityAnalyzer()
.This notebook is self-explanatory:
https://github.com/FlorinAndrei/misc/blob/master/p_tqdm_bug_1.ipynb
p_map()
is orders of magnitude slower.However, the same function seems to work fine, fast enough, for another task - reading 25k files off the disk.
Windows 10, Python 3.8.8, Jupyter Notebook
The text was updated successfully, but these errors were encountered: