-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming/Windowed GeoTiff Reading #1905
Conversation
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
…entBytes Signed-off-by: jbouffard <[email protected]>
…SegmentBytes Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
Signed-off-by: jbouffard <[email protected]>
…made in StreamingSegmentBytes Signed-off-by: jbouffard <[email protected]>
…streaming Signed-off-by: Grigory Pomadchin <[email protected]>
4b0a995
to
2e53e9c
Compare
Signed-off-by: Grigory Pomadchin <[email protected]>
Signed-off-by: Grigory Pomadchin <[email protected]>
e19c4fd
to
0d48955
Compare
Signed-off-by: Grigory Pomadchin <[email protected]>
Signed-off-by: Grigory Pomadchin <[email protected]>
Signed-off-by: Grigory Pomadchin <[email protected]>
Signed-off-by: Grigory Pomadchin <[email protected]>
…s function Signed-off-by: Grigory Pomadchin <[email protected]>
Signed-off-by: Grigory Pomadchin <[email protected]>
8acb1f6
to
ff0ec4b
Compare
Took into account all comments; |
Signed-off-by: Grigory Pomadchin <[email protected]>
ab15c26
to
e1258ec
Compare
@@ -266,7 +275,7 @@ object GeoTiffReader { | |||
} | |||
} | |||
|
|||
private def readGeoTiffInfo(byteReader: ByteReader, decompress: Boolean, streaming: Boolean): GeoTiffInfo = { | |||
private def readGeoTiffInfo(byteReader: ByteReader, decompress: Boolean, streaming: Boolean, extent: Option[Extent]): GeoTiffInfo = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the Option[Extent]
used? Seems unused.
) extends SegmentBytes with LazyLogging { | ||
import LazySegmentBytes.Segment | ||
|
||
// TODO: verify this is correct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we addressed this TODO?
val createSegment: Int => BitGeoTiffSegment = { i => | ||
val (segmentCols, segmentRows) = segmentLayout.getSegmentDimensions(i) | ||
// val size = segmentCols * segmentRows | ||
val decompressGeoTiffSegment = { (i: Int, bytes: Array[Byte]) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lossyrob
This breaks "public" API by removing createSegment
aside from having a bad name the signature of createSegment
forces it to use getSegment
, which breaks streaming.
Question: getSegment
be added back for this PR with deprecation
and a warning or is this a forgivable sin?
0bdcbb8
to
95c6a27
Compare
* The base trait of SegmentBytes. It can be implemented either as | ||
* an Array[Array[Byte]] or as a ByteBuffer that is lazily read in. | ||
*/ | ||
trait SegmentBytes extends Seq[Array[Byte]] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This had to changes from Traversable
to Seq
because the only way the former can tell its length is to iterate through the collection, which is obviously not great for streaming.
Here is a handy test I'm using: import geotrellis.spark.io.s3._
import geotrellis.spark.io._
import geotrellis.spark._
import geotrellis.raster._
import geotrellis.raster.io._
import geotrellis.raster.io.geotiff._
import geotrellis.proj4._
import geotrellis.vector._
val client = S3Client.DEFAULT
import geotrellis.spark.io.s3.util._
val rr = S3RangeReader("geotrellis-test", "rf-test/356f564e3a0dc9d15553c17cf4583f21-6.tif", client)
val tiff = MultibandGeoTiff.streaming(rr)
val sube = tiff.extent.center.buffer(tiff.extent.width * 0.01).envelope
tiff.crop(sube) |
The main culprit here is
StreamingSegmentBytes
. In order to do its job it needs to know ahead of time which segments you will read. At that point it can use the TiffTags to group them into contiguous chunks which can be fetched in on-disk order. This requires slight inversion of control betweensegmentBytes
andGeoTiffTile
where we allow the bytes to decide the order in which we see the segment. Overall it looks like this:is replaced by
Because
getSegments
gives us an iterator we can zip them and use them for combine as well:Things that need done before this is done:
getSegment
withgetSegments
inGeoTiffMultiBandTile
(some a tricksy)StreamingSegmentBytesSpec
for current code (restoredLazySegmentBytes
)