Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming/Windowed GeoTiff Reading #1905

Merged
merged 90 commits into from
Mar 10, 2017

Conversation

echeipesh
Copy link
Contributor

@echeipesh echeipesh commented Dec 12, 2016

The main culprit here is StreamingSegmentBytes. In order to do its job it needs to know ahead of time which segments you will read. At that point it can use the TiffTags to group them into contiguous chunks which can be fetched in on-disk order. This requires slight inversion of control between segmentBytes and GeoTiffTile where we allow the bytes to decide the order in which we see the segment. Overall it looks like this:

cfor(0)(_ < segmentCount, _ + 1) { segmentIndex =>
   val segment = getSegment(segmentIndex).bytes
   val segmentSize = segment.size
   val bandSegmentCount = segmentSize / bandCount
   val bandSegment = Array.ofDim[Byte](bandSegmentCount)
...

is replaced by

getSegments(0 until segmentCount).foreach { case (segmentIndex, geoTiffSegment) =>
    val segment = geoTiffSegment.bytes
    val segmentSize = segment.size
    val bandSegmentCount = segmentSize / bandCount
    val bandSegment = Array.ofDim[Byte](bandSegmentCount)
...

Because getSegments gives us an iterator we can zip them and use them for combine as well:

getSegments(0 until segmentCount)
  .zip(otherGeoTiff.getSegments(0 until segmentCount))
  .foreach { case ((segmentIndex, segment), (otherIndex, otherSegment)) =>
    require(segmentIndex == otherIndex, s"Segment index mismatch: $segmentIndex != $otherIndex")
    val newBytes = segment.mapWithIndex { (i, z) =>
      f(z, otherSegment.getInt(i))
    }
    arr(segmentIndex) = compressor.compress(newBytes, segmentIndex)
  }

Things that need done before this is done:

  • Test on S3 and verify performance on reading fully and windowed
  • Test on S3 file reading multiple crops from the same GeoTiff (verify that byteReader doesn't Reset)
  • Replace all usages of getSegment with getSegments in GeoTiffMultiBandTile (some a tricksy)
  • Update the StreamingSegmentBytesSpec for current code (restored LazySegmentBytes)

jbouffard and others added 30 commits December 5, 2016 12:14
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
…made in StreamingSegmentBytes

Signed-off-by: jbouffard <[email protected]>
Signed-off-by: Grigory Pomadchin <[email protected]>
Signed-off-by: Grigory Pomadchin <[email protected]>
@pomadchin
Copy link
Member

Took into account all comments; StreamingSegmentBytesSpec is still needs to be done.
@echeipesh @lossyrob

Signed-off-by: Grigory Pomadchin <[email protected]>
@@ -266,7 +275,7 @@ object GeoTiffReader {
}
}

private def readGeoTiffInfo(byteReader: ByteReader, decompress: Boolean, streaming: Boolean): GeoTiffInfo = {
private def readGeoTiffInfo(byteReader: ByteReader, decompress: Boolean, streaming: Boolean, extent: Option[Extent]): GeoTiffInfo = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the Option[Extent] used? Seems unused.

) extends SegmentBytes with LazyLogging {
import LazySegmentBytes.Segment

// TODO: verify this is correct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we addressed this TODO?

val createSegment: Int => BitGeoTiffSegment = { i =>
val (segmentCols, segmentRows) = segmentLayout.getSegmentDimensions(i)
// val size = segmentCols * segmentRows
val decompressGeoTiffSegment = { (i: Int, bytes: Array[Byte]) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lossyrob
This breaks "public" API by removing createSegment aside from having a bad name the signature of createSegment forces it to use getSegment, which breaks streaming.

Question: getSegment be added back for this PR with deprecation and a warning or is this a forgivable sin?

* The base trait of SegmentBytes. It can be implemented either as
* an Array[Array[Byte]] or as a ByteBuffer that is lazily read in.
*/
trait SegmentBytes extends Seq[Array[Byte]] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had to changes from Traversable to Seq because the only way the former can tell its length is to iterate through the collection, which is obviously not great for streaming.

@echeipesh
Copy link
Contributor Author

Here is a handy test I'm using:

import geotrellis.spark.io.s3._
import geotrellis.spark.io._
import geotrellis.spark._
import geotrellis.raster._
import geotrellis.raster.io._
import geotrellis.raster.io.geotiff._
import geotrellis.proj4._
import geotrellis.vector._

val client = S3Client.DEFAULT

import geotrellis.spark.io.s3.util._
val rr = S3RangeReader("geotrellis-test", "rf-test/356f564e3a0dc9d15553c17cf4583f21-6.tif", client)
val tiff = MultibandGeoTiff.streaming(rr)
val sube = tiff.extent.center.buffer(tiff.extent.width * 0.01).envelope
tiff.crop(sube)

@echeipesh echeipesh merged commit 796f651 into locationtech:master Mar 10, 2017
@lossyrob lossyrob modified the milestones: 1.1, 1.0.1 Mar 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants