-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release the GIL in core transformation code #386
Comments
I think before we do this, we will need to check in with the PROJ devs to see if this is safe. Maybe do some testing with it as well. Are you currently able to use dask with multithreading? Or is this blocking anything? |
I think it is certainly interesting to discuss with PROJ devs about thread safety using single PJ objects (to do parallel work in pyproj as you were trying), as I mentioned also on the PR. Eg |
threading related issue: OSGeo/PROJ#1047 |
Looks like you need a new context per thread. This conflicts with this issue #374 |
Based on current master for reference with future changes: import concurrent.futures
from pyproj import Transformer
TRANSFORMER = Transformer.from_crs(4326, 3857)
def transform_point(aa):
assert TRANSFORMER.transform((12, 11), (13, 12)) == ((1447153.3803125564, 1335833.8895192828), (1345708.4084091093, 1232106.80189676))
def transform_point__recreate(aa):
assert Transformer.from_crs(4326, 3857).transform((12, 11), (13, 12)) == ((1447153.3803125564, 1335833.8895192828), (1345708.4084091093, 1232106.80189676))
def transform_threaded_recreate():
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
executor.map(transform_point__recreate, range(20))
def transform_threaded():
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
executor.map(transform_point, range(20))
def transform():
[transform_point(ii) for ii in range(20)]
def transform_recreate():
[transform_point__recreate(ii) for ii in range(20)] Based on this, I think you would have to have a way to prevent re-starting threads and re-creating transformers each time a thread runs a job. You would have to have a each thread do a batch of transforms based on a Transformer created at the start of the thread job. |
I don't fully follow what you are saying here. The use case I have in mind is I did a quick experiment, taking the code base from a few weeks ago (before the global PROJ_CTX was introduced), and added a |
Interesting, so a 2x speedup from not releasing the Gil? Does shapely release the Gil? |
No, not with shapely. I should have been clearer, it was not a direct timing of to_crs (which indeed currently goes through shapely's transform), but just focusing on the pyproj transform part. So what I timed was running this function on 10 million points:
Which is certainly not the only part involved in |
See discussion in #380 (comment).
By using
nogil
in cython we can ensure to release the GIL in the hot spots of the transformation code, which can ensure that other applications can use pyproj code in parallel (eg using dask)The text was updated successfully, but these errors were encountered: