-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed improvements in WKB geometry import, and ogr2ogr #8560
Conversation
nice speedup! could you give the before/after outputs of /usr/bin/time instead of the builtin time in order to also show memory consumption evolution? |
my apologies Even, rewording my comment as it was not my intension to be pushy: |
@tbonfort Very relevant question, that made me realized the fix of fcbed4e was needed .... Without the optimization using the GetNextFeature() + CreateFeature() looping strategy With optimization, using the default 4-thread strategy in the GeoPackage driver, to prefetch the input data (ie the GeoPackage driver not only acquires a single batch of features when requested but spawns threads to fetch the next ones): With optimization, prefetching limited to 1 thread: The amount of memory is indeed larger with the use of the Arrow API, which is expected. By default we acquire batches of a number of features up to the value of the -gt parameter, which defaults to 100,000. When limiting to 10,000: $ /usr/bin/time -v ogr2ogr out.gpkg nz-building-outlines.gpkg -lco spatial_index=no -progress --config GDAL_NUM_THREADS 1 -gt 10000 I cannot really make sense of that last result. Why would decreasing the size of the batch increase the RAM usage... ? |
d76590e
to
314feb3
Compare
…to run importFromWkb() on the same object and limiting the number of dynamic memory (re)allocations - For lines, we keep track of the maximum capacity of the point array, with a new member variable. So doing setNumPoints(10), then setNumPoints(3), then setNumPoints(9) just allocates an array of size 10. - For polygons, if importing a WKB for a single ring polygon on top of an existing single ring polygon, reuse the existing ring (and benefit from the previous optimization) - For multipolygons, if importing a WKB for a single part multipolygon on top of an existing single part multipolygon, reuse the existing polygon (and benefit from the previous optimization) - Similar for multilinestring
…ect to save memory allocations
…FastGetArrowStream, even if the output driver doesn't declare OLCFastWriteArrowBatch This helps in a notable way for Parquet -> GPKG or GPKG -> GPKG when disabling the creation of the spatial index which is now the major time consumer. Now (using Arrow API): ``` $ time ogr2ogr out.gpkg nz-building-outlines.gpkg -lco spatial_index=no -progress 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m12,868s user 0m13,338s sys 0m1,843s ``` Without use of Arrow API: ``` $ time ogr2ogr out.gpkg nz-building-outlines.gpkg -lco spatial_index=no -progress --config OGR2OGR_USE_ARROW_API NO 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m17,625s user 0m15,917s sys 0m1,704s ```
314feb3
to
bfab3f6
Compare
OGRLineString/Polygon/MultiPolygon/MultiLineString: make it possible to run importFromWkb() on the same object and limiting the number of dynamic memory (re)allocations
with a new member variable.
So doing setNumPoints(10), then setNumPoints(3), then setNumPoints(9)
just allocates an array of size 10.
an existing single ring polygon, reuse the existing ring (and benefit
from the previous optimization)
on top of an existing single part multipolygon, reuse the existing
polygon (and benefit from the previous optimization)
WriteArrowBatch() generic implementation: reuse existing geometry object to save memory allocations
ogr2ogr: use Arrow interface as soon as the input driver declares OLCFastGetArrowStream, even if the output driver doesn't declare OLCFastWriteArrowBatch
This helps in a notable way for Parquet -> GPKG or GPKG -> GPKG when
disabling the creation of the spatial index which is now the major time
consumer.
Now (using Arrow API):
Without use of Arrow API: