-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limitation of bounding box approach in PROJ 6 (seen on Italian Monte Mario to WGS84 transformations) #1461
Comments
This one is my dream setup. It obviously adds complexity but it would be interesting to get an estimate on how much the larger DB and polygon lookups affects performance.
I can imagine this approach will require quite a bit of trial and error to dial in the exclusion/inclusion rules. There will likely also be areas that are impossible to handle (at least I can't imagine how right now), for instance the overlapping areas of use for Denmark and Sweden. The greater Copenhagen area and a large chunk of Southern Sweden are contained in the neighbouring countrys AoU. It entirely depends on the input which AoU should be used. The polygon approach can obviously handle this (assuming the polygons are correct). |
Relatively easy with GDAL
This creates a 34 MB large Spatialite file with compressed geometries, or a 67 MB large GeoPackage with uncompressed geometries. (Spatialite compressed geometries using Float32 for delta-encoding of the 2nd vertex and following ones).
Point-in-polygon tests would only be done once the BBOX criterion has passed. With GEOS implementation (GEOSContains()), when testing if a point is inside France contour (4628 points for continent unsimplified), I get ~1.2 ms per test with
(the real cost might be a bit less since for each test there's a OGRGeometry -> GEOSGeometry conversion) That said, we should reimplement our own point-in-polygon code since we probably don't want to depend on GEOS, or find an existing one compatible of X/MIT (that said JTS, whose GEOS is a port, uses http://www.eclipse.org/org/documents/edl-v10.php which is BSD). Boost Geometry (https://www.boost.org/doc/libs/1_65_1/libs/geometry/doc/html/index.html) could also be a potential candidate. GEOS/JTS has also the concept of prepared geometries which could be useful: when proj_create_crs_to_crs() determines that there are several candidate coordinate operations, it could create a prepared geometry for each area of use (to experiment if there's a penalty in doing so that would make it not appropriate for single point transformation), which can speed up significantly point-in-polygon testing, but that would be even more complicated to reimplement from scratch. |
The dataset is also available as a shapefile ~60 MB uncompressed. Here is the world, subdivided into nine regions defined by IGOP: Potentially these geographic coverages could be split into the regional datasets for proj-datumgrid, but this could be more trouble than it's worth. As for potential licensing, the metadata states:
(I'll assume the correct link is http://www.epsg.org/TermsOfUse.aspx) |
Suggestion by Sandro Furieri: to store the coordinate values as decimal degrees * 100, truncated to integer, so to have a 1e-2 deg precision, ~ 1km . So we could store each X/Y value on a signed short, which based on my above 9 MB GeoPackage that used -simplify 1e-2 and double, could bring the size of the contours to 9 / 4 = ~ 2.3 MB.
Seems more complication to me indeed. |
Makes sense, but why do decimal scaling when going for binary representation? Doing a binary scaling would offer almost the double precision for the same storage (and computational) costs: Scaling the numbers by 2^16/360 results in a unit ("binary degree" - "bigree", perhaps?) of 1/182 degree, i.e. 0.6 km worst case. |
Caution: the range of longitudes in the EPSG polygons is [-270,330]. 330 is reached for urn:ogc:def:area:EPSG::4523 (World centred on 150°E) and -270 for urn:ogc:def:area:EPSG::4520 (World centred on 90°W). That still fits on a 16bit range with 1e-2 deg decimal precision, since 330 - -270=600 < 65536/100, but offsetting would be needed because 330*100 > int16_max. |
So you would need the binary scaling to reach 1 km resolution anyway. If using the offsetting, probably the test for point-in-polygon should work on the offset values, by offsetting the point under test. I think it will become a mess no matter what, which leads me to the following alternative solution, which I suspect will be only slightly less compact, and faster to look up - given that we are satisfied with 1 km resolution:
(steps 3 to 5 are really one: You RLE identical rows by giving the same starting address in the index - note that the since “all zero rows” will be the common case, these can be given start address 0, and handled up front to speed up things. Also note that identical rows don’t even need to be consecutive to share the same representation) When instantiating a transformation needing polygon lookup, decode the row index only: I suppose that in most cases the row will consist of zeros only disturbed by a few runs of ones, so decoding on the fly will be fast. Also, an additional potential speed up and compression could be obtained by adding a bounding box entry to the data structure, checking that first, and only storing the portion of the global grid inside the BBOX. Note that the resolution of this representation is a parameter, so if we at some time are not satisfied with 1 km resolution, we could increase it by making the grid larger: The RLE will ensure that the growth in representation will be somewhat under control. Example: This rasterized polygon:
would be RLEd as
(actually, we don't even need to store the last run length of each row, since we know it will last until the end of the row) Special-casing the zero rows to index 0, the row index would then read:
Where the top and bottom zero rows would again be RLEd in the disk representation, but not(?) in the PJ (in many cases they may be compressed away entirely by the BBOX-trick, although some zero rows may remain in cases of disjoint parts) The rasterization of the polygons could be part of the datumgrid projects, hence making it up to these to decide on which selections of resolutions to offer (as in the case of the GSHHG offering several levels of resolution) NOTE: While I evidently like this suggestion (otherwise, I would have remained silent), I am not, for the moment, able to back it up with an actual development effort - although it would be fun to do... |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
cs2cs / proj_create_crs_to_crs()+proj_trans() use a bounding box based approach to determine which candidate coordinate operation to use. They use the list of coordinate operations returned by proj_create_operations() in its sorted order where the most relevant operation will be sorted first. proj_trans() will iterate over that list until it finds a bounding box where the point to transform is.
This is a limitation in cases when the bounding box of an area of use include other ones. This is typically the case when looking at the output of
projinfo -s EPSG:4326 -t EPSG:3003 --spatial-test intersects
.It returns 3 transformations for
One can notice that Sardinia bbox is strictly contained in mainland, and Sicily partly intersects it as well. So there is no chance that the Sardinia specific transformations will be selected, and little for Sicily
A bullet-proof approach would be to use the precise contour of the area of use which are provided by EPSG as GML files, but that will make the database larger and will involve point-in-polygon testing. Another approach would be to use them temporarily and do offline testing where one would add in the database that a given BBOX should exclude other BBOX. In the above case, Italy-mainland would exclude Sardinia and Sicily. Not sure if that would make things simpler implementation wise.
The text was updated successfully, but these errors were encountered: