Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release the GIL in core transformation code #386

Closed
jorisvandenbossche opened this issue Aug 5, 2019 · 8 comments · Fixed by #437
Closed

Release the GIL in core transformation code #386

jorisvandenbossche opened this issue Aug 5, 2019 · 8 comments · Fixed by #437
Labels
proposal Idea for a new feature.

Comments

@jorisvandenbossche
Copy link
Contributor

See discussion in #380 (comment).

By using nogil in cython we can ensure to release the GIL in the hot spots of the transformation code, which can ensure that other applications can use pyproj code in parallel (eg using dask)

@jorisvandenbossche jorisvandenbossche added the proposal Idea for a new feature. label Aug 5, 2019
@snowman2
Copy link
Member

snowman2 commented Aug 5, 2019

I think before we do this, we will need to check in with the PROJ devs to see if this is safe. Maybe do some testing with it as well.

Are you currently able to use dask with multithreading? Or is this blocking anything?

@jorisvandenbossche
Copy link
Contributor Author

I think it is certainly interesting to discuss with PROJ devs about thread safety using single PJ objects (to do parallel work in pyproj as you were trying), as I mentioned also on the PR.
But releasing the GIL would already allow to 1) do other calculations that release the GIL in parallel or 2) to do transformations in parallel if you ensure to create the Transformer object in each thread.

Eg to_crs in geopandas creates a Transformer object within that method, so you could see a case where this method is called on different chunks of the dataframe in parallel.

@snowman2
Copy link
Member

snowman2 commented Aug 6, 2019

threading related issue: OSGeo/PROJ#1047

@snowman2
Copy link
Member

snowman2 commented Aug 6, 2019

Looks like you need a new context per thread. This conflicts with this issue #374

@snowman2
Copy link
Member

Based on current master for reference with future changes:

import concurrent.futures
from pyproj import Transformer

TRANSFORMER = Transformer.from_crs(4326, 3857)

def transform_point(aa):
    assert TRANSFORMER.transform((12, 11), (13, 12)) == ((1447153.3803125564, 1335833.8895192828), (1345708.4084091093, 1232106.80189676))

def transform_point__recreate(aa):
    assert Transformer.from_crs(4326, 3857).transform((12, 11), (13, 12)) == ((1447153.3803125564, 1335833.8895192828), (1345708.4084091093, 1232106.80189676)) 
    
def transform_threaded_recreate():
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        executor.map(transform_point__recreate, range(20))

def transform_threaded():
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        executor.map(transform_point, range(20))

def transform():
    [transform_point(ii) for ii in range(20)]
    
def transform_recreate():
    [transform_point__recreate(ii) for ii in range(20)]

image

Based on this, I think you would have to have a way to prevent re-starting threads and re-creating transformers each time a thread runs a job. You would have to have a each thread do a batch of transforms based on a Transformer created at the start of the thread job.

@jorisvandenbossche
Copy link
Contributor Author

Based on this, I think you would have to have a way to prevent re-starting threads and re-creating transformers each time a thread runs a job. You would have to have a each thread do a batch of transforms based on a Transformer created at the start of the thread job.

I don't fully follow what you are saying here.

The use case I have in mind is to_crs on a dask-based GeoDataFrame. You could transform each chunk in parallel, and indeed in the function you apply in parallel you need to recreate the Transformer object (as otherwise it cannot be threadsafe).

I did a quick experiment, taking the code base from a few weeks ago (before the global PROJ_CTX was introduced), and added a with nogil: statement around the proj_trans_generic call in _Transformer._transform. And with 4 cores, it gives me a 2x speed-up (so not really great).

@snowman2
Copy link
Member

Interesting, so a 2x speedup from not releasing the Gil? Does shapely release the Gil?

@jorisvandenbossche
Copy link
Contributor Author

No, not with shapely. I should have been clearer, it was not a direct timing of to_crs (which indeed currently goes through shapely's transform), but just focusing on the pyproj transform part. So what I timed was running this function on 10 million points:

def _geopandas_to_crs(x, y, crs_in, crs_out):
    # specific optimized version for points
    proj_in = pyproj.Proj(crs_in, preserve_units=True)
    proj_out = pyproj.Proj(crs_out, preserve_units=True)
    transformer = pyproj.Transformer.from_proj(proj_in, proj_out, always_xy=True)
    x2, y2 = transformer.transform(x, y)
    return x2, y2

Which is certainly not the only part involved in to_crs. Because if we want to go this way for geopandas in general, we need functionality to first extract all coordinates of the geometries and then reconstruct geometries (something which needs to happen anyway (and currently happens with shapely's transform), but would be interesting to time if doing all proj transformations in one go by passing an array is faster)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Idea for a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants