Allow to set the chunk size used by the thread pool #21

BenBE · 2020-04-20T22:48:47Z

There should be an option to allow the caller to provide the chunk size used by the thread pool created by p_tqdm._parallel. Using the default can be quite inefficient, especially when the caller knows that each of the operations inside the map is usually quite fast.

Rationale:
https://medium.com/@rvprasad/data-and-chunk-sizes-matter-when-using-multiprocessing-pool-map-in-python-5023c96875ef

The text was updated successfully, but these errors were encountered:

joeyorlando · 2020-05-25T17:18:09Z

I needed something similar but with the underlying multiprocessing.Pool instance. Needed to be able to specify maxtasksperchild=10 to the Pool constructor. You can do this by "monkeypatching" with functools.partial.

This worked for my needs:

import functools
import p_tqdm.p_tqdm as p_tqdm
from p_tqdm import p_map

def monkeypatch() -> None:
    p_tqdm.Pool = functools.partial(p_tqdm.Pool, maxtasksperchild=10)

monkeypatch()
results = p_map(_scrape_data_async, data_to_process, num_cpus=15)  # this will use maxtasksperchild=10, can similarly provide

In your case you could probably use something like this (haven't tested):

import functools
import p_tqdm.p_tqdm as p_tqdm
from p_tqdm import p_map

def monkeypatch() -> None:
    p_tqdm.Pool.map = functools.partial(p_tqdm.Pool.map, chunksize=15)  # whatever chunksize you want

monkeypatch()
results = p_map(_scrape_data_async, data_to_process, num_cpus=15)```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to set the chunk size used by the thread pool #21

Allow to set the chunk size used by the thread pool #21

BenBE commented Apr 20, 2020

joeyorlando commented May 25, 2020

Allow to set the chunk size used by the thread pool #21

Allow to set the chunk size used by the thread pool #21

Comments

BenBE commented Apr 20, 2020

joeyorlando commented May 25, 2020