-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instantiating many proj objects much slower than 1.9.6 #661
Comments
Unfortunately, there isn't a way to do so that I am aware of. If you re-use them, you could use a dictionary to store the objects and use it to look them up. |
@snowman2 Thanks for the quick reply. To clarify, the repeated transformations is only if the same CRS is transformed to/from, correct? It won't help when I need to instantiate 30k different CRSs? |
I shared that as another case where we have seen this problem. When you are creating them, it is slower with the new version of PROJ. However, if you are able to cache and re-use them, you will be able to shave off time upon re-use. |
@dmahr1 you could also try version 2.3.x and check the speed there as well as it had a different method for the context. |
@snowman2 Thank you for the suggestion! In 2.3.1, it took 24 seconds to run the gist, which is still about 10x slower than 1.9.6 but 7x faster than 2.6.1. Even so, I think I found an approach where I can upgrade all the way to 2.6.x. My plan is to instantiate all of the How complex would it be to add a |
Probably possible now with the changes in 3.0. Could probably use the PROJ pipeline string to do that.
How fast can they be un-pickled? |
I am wondering if it might be worthwhile to re-look at the implementation in 2.3 and see if there is a better way. It had 2 issues to overcome: 1. Building with Windows 2. Threading |
Looks like the pickling of the transformer may take more thinking: >>> from pyproj.transformer import Transformer
>>> tr = Transformer.from_crs("epsg:4326", "+proj=aea +lat_0=50 +lon_0=-154 +lat_1=55 +lat_2=65 +x_0=0 +y_0=0 +datum=NAD27 +no_defs +type=crs +units=m", always_xy=True)
>>> tr.definition
'unavailable until proj_trans is called' |
@snowman2 In 2.6.1, instantiating and pickling about 7.5k CRSs took about 800 seconds. Unpickling them took 120 seconds. The "reverse lookup" search took 0.05 seconds 😂 I'll admit that this is a pretty esoteric use case of PROJ/PyProj, so I don't expect the library to be optimized around it. There are so many amazing improvements that you and other contributors have made in the last couple years with the new datum support and everything. It's just a bummer that there's been a bit of a performance cost. Perhaps I will just leave PyProj 1.x installed in a separate Python environment and shell out to it on-demand. Hacky and gross...but if it works? |
Yeah, sounds like pickling didn't help at all. Oh well. Sounds like having pyproj 1 in another environment doesn't sound too bad of an idea at the moment for your application. If you turn it into a CLI/GUI application, it could load all of the transformers into dictionaries bases on the input projection name and have it wait for user input. The first one would be slow as it needs to load, but the next ones would be faster since the program is always loaded. |
In #675 it seems like I have achieved speedups. This example is how I got the best speedup: import pyproj, datetime
test_codes = pyproj.get_codes("EPSG", pyproj.enums.PJType.PROJECTED_CRS, False)
start = datetime.datetime.now()
projs = []
for code in test_codes:
try:
projs.append(pyproj.Proj(f'EPSG:{code}'))
except pyproj.exceptions.ProjError as err:
pass
print(f'Instantiating {len(projs)} projs took {(datetime.datetime.now() - start).total_seconds()} seconds') In this example, I am assuming that since you were using the Using pyproj 3.0.dev0 with PYPROJ_GLOBAL_CONTEXT=ON, the output was:
Using pyproj 2.6.1post1
Using the second gist you linked above, it still took ~100 seconds to initialize everything using the global context. I am guessing it is due to some of the EPSG codes causing errors that slowed things down. |
😲 @snowman2 That is amazing!! And yes, your assumption is correct, I am mostly focusing on projected CRSs rather than different GCSs. I am curious what you did to achieve this incredible speedup. I've never written any Cython, so I am guessing a bit here. But it looks like the global context is leaving open a persistent connection to the database via the PROJ C API? In other words, it was the setup/teardown of that connection that was causing all of the latency before? |
That was one of the settings that needed to be tweaked to get this to work. Also, adding the settings to the context beforehand and not updating them each time shaved off time. |
@snowman2 Thanks again for those speedups. I finally got around to putting this tool into a cloud function and added it in the little form here: https://ihatecoordinatesystems.com/#correct-crs |
Nice! Thanks for sharing. This was recently added: https://pyproj4.github.io/pyproj/latest/api/database.html#pyproj-database-query-crs-info. Since you start with a lat/lon, this could help subset the number of results you get: from pyproj.aoi import AreaOfInterest
from pyproj.enums import PJType
from pyproj.database import query_crs_info
crs_info_list = query_crs_info(
auth_name="EPSG",
pj_types=PJType.PROJECTED_CRS,
area_of_interest=AreaOfInterest(
west_lon_degree=-10,
south_lat_degree=-10,
east_lon_degree=10,
north_lat_degree=10,
),
) I thought it might be something worth trying out. |
@snowman2 I definitely thought about using area of interest to filter projections. And that's a really cool new helper function for querying them! But there's nothing stopping a novice GIS user from (wrongly) using a coordinate system for points outside of the area of interest, right? In that case I think it's better to be thorough and just check everything. Each request to the cloud function only takes about 500 ms :) |
I have a tool in pyproj 1.9.6 that I use for doing a "reverse lookup" of projections. Given an x/y in unknown CRS and a known longitude/latitude, this finds the projections which place that x/y closest to the known longitude/latitude. This is helpful when trying to track down an unknown coordinate system.
This requires instantiating thousand of
proj
/Proj
objects, but it only takes a few seconds in pyproj 1.9.6. Recently I wanted to upgrade to more recent versions of PROJ and GDAL, but this tool is now taking a few minutes, about 50 times longer in pyproj 2.x:I know that a lot changed in the underlying PROJ C++ library between pyproj 1.9.6 and 2.x. But is there any way to restore the fast instantiation of the
proj
/Proj
objects? The projections don't have to be exact - just close enough to this reverse lookup tool. Also, I am willing to serialize/pickle theproj
objects if that would help, though my understanding was that that didn't work with Python C extensions.The text was updated successfully, but these errors were encountered: