Skip to content

Extracting all raster tiles from a PMTiles file #514

Answered by bdon
diehl asked this question in Q&A
Discussion options

You must be logged in to vote

Having the tiles all on S3 is one way but you're paying a latency overhead requesting them each at once. If you had to write a custom program for parallel processing you could do something like:

  1. use deserialize_directory to get all tile IDs and offsets in memory
  2. split the # of tiles into N equal sized parts
  3. Request a stream of tile contents from each worker process/machine, since if clustered=true they are contiguous, which will let you batch many tiles into a single network I/O operation.
  4. Write each stream of tiles out to a separate S3 sink file along with serialized information about their IDs and offsets (using pickle, etc)

You would then need a single process to merge N output files…

Replies: 1 comment 6 replies

Comment options

You must be logged in to vote
6 replies
@diehl
Comment options

@bdon
Comment options

bdon Jan 13, 2025
Maintainer

@diehl
Comment options

@bdon
Comment options

bdon Jan 14, 2025
Maintainer

Answer selected by diehl
@diehl
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants