You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There should be an option to allow the caller to provide the chunk size used by the thread pool created by p_tqdm._parallel. Using the default can be quite inefficient, especially when the caller knows that each of the operations inside the map is usually quite fast.
I needed something similar but with the underlying multiprocessing.Pool instance. Needed to be able to specify maxtasksperchild=10 to the Pool constructor. You can do this by "monkeypatching" with functools.partial.
This worked for my needs:
importfunctoolsimportp_tqdm.p_tqdmasp_tqdmfromp_tqdmimportp_mapdefmonkeypatch() ->None:
p_tqdm.Pool=functools.partial(p_tqdm.Pool, maxtasksperchild=10)
monkeypatch()
results=p_map(_scrape_data_async, data_to_process, num_cpus=15) # this will use maxtasksperchild=10, can similarly provide
In your case you could probably use something like this (haven't tested):
There should be an option to allow the caller to provide the chunk size used by the thread pool created by
p_tqdm._parallel
. Using the default can be quite inefficient, especially when the caller knows that each of the operations inside the map is usually quite fast.Rationale:
https://medium.com/@rvprasad/data-and-chunk-sizes-matter-when-using-multiprocessing-pool-map-in-python-5023c96875ef
The text was updated successfully, but these errors were encountered: