Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Windows That Conform To GeoTiff Segments #2402

Merged
merged 14 commits into from
Oct 11, 2017

Conversation

jamesmcclain
Copy link
Member

@jamesmcclain jamesmcclain commented Oct 2, 2017

  • Ensure that S3 window size is being properly interpreted (is windowsSize a length or an area in this context?)
  • Ensure that handling of striped segments is viable (are tiles of one pixel in height okay?)

Connects #2173

@jamesmcclain jamesmcclain force-pushed the fix/confirming-windows branch 4 times, most recently from bc6626c to 860cf98 Compare October 4, 2017 12:37
@jamesmcclain jamesmcclain changed the title [WiP] Generate Windows That Conform GeoTiff Segments [WiP] Generate Windows That Conform To GeoTiff Segments Oct 4, 2017
@jamesmcclain jamesmcclain force-pushed the fix/confirming-windows branch 3 times, most recently from bb9d4d1 to bccb415 Compare October 5, 2017 02:32
@jamesmcclain jamesmcclain changed the title [WiP] Generate Windows That Conform To GeoTiff Segments Generate Windows That Conform To GeoTiff Segments Oct 5, 2017
@jamesmcclain jamesmcclain mentioned this pull request Oct 5, 2017
1 task
val segments: RDD[(String, Array[GridBounds])] =
sourceGeoTiffInfo.segmentsByPartitionBytes(partitionBytes, windowSize)
case (_, Some(partitionBytes)) => {
val maxSize = math.min(options.maxTileSize.getOrElse(1<<10), windowSize.getOrElse(1<<10)) // XXX is windowSize a length or an area?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

windowSize is length, implicitly for square windows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why can't we write here just 1024 instead of 1 << 10?
  2. Don't think that getOrElse is a good idea here, as it can hide a possible user error. What logic do you want to follow here? Mb we can add these attributes into the match function above? match supports logical or too.
  3. windowSize is length

@echeipesh echeipesh requested a review from pomadchin October 6, 2017 16:22
@echeipesh
Copy link
Contributor

Ensure that handling of striped segments is viable (are tiles of one pixel in height okay?)

It seems strange but the thinking went is that if you don't have an opinion about the shape of tiles you want you're probably using this RDD as input to tiler. From the perspective of optimizing for IO reading the segments as skinny tiles is fine, from the perspective of tiler chunking a skinny tile into multiple tiles is not problematic either.

@jamesmcclain
Copy link
Member Author

It seems strange but the thinking went is that if you don't have an opinion about the shape of tiles you want you're probably using this RDD as input to tiler. From the perspective of optimizing for IO reading the segments as skinny tiles is fine, from the perspective of tiler chunking a skinny tile into multiple tiles is not problematic either.

Sounds capital.

Copy link
Member

@pomadchin pomadchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works fine! A couple of comments and most of them are related to code style / api questions.

val segments: RDD[(String, Array[GridBounds])] =
sourceGeoTiffInfo.segmentsByPartitionBytes(partitionBytes, windowSize)
case (_, Some(partitionBytes)) => {
val maxSize = math.min(options.maxTileSize.getOrElse(1<<10), windowSize.getOrElse(1<<10)) // XXX is windowSize a length or an area?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why can't we write here just 1024 instead of 1 << 10?
  2. Don't think that getOrElse is a good idea here, as it can hide a possible user error. What logic do you want to follow here? Mb we can add these attributes into the match function above? match supports logical or too.
  3. windowSize is length

val layout = sourceGeoTiffInfo.getGeoTiffInfo(s"s3://$bucket/$key").segmentLayout.tileLayout

RasterReader
.listWindows(cols, rows, options.maxTileSize.getOrElse(1<<10), layout.tileCols, layout.tileRows)
Copy link
Member

@pomadchin pomadchin Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same is here:

Looks not safe:

options.maxTileSize.getOrElse(1<<10)

Mb to throw an exception or to handle it in a way user will know that smth is used by default (at least to add some warning). In addition, somewhere above you already used getOrElse two times with 1024 default value, mb it makes sense to add these default values somewhere? As we can easily forget about these default values.

Finally the question, mb Options should contain 1024 value for maxTileSize and windowSize by default? (everything is defined in S3GeoTiffRDD object).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as changing the default maxTileSize is not construed to constitute an API change, I prefer that.

@@ -59,15 +60,70 @@ private [geotrellis] trait GeoTiffInfoReader extends LazyLogging {
}
}

def windowsByBytes(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function description is missing, would be good to add.
Btw have you compared it to segmentsByPartitionBytes? Should we remove segmentsByPartitionBytes at all?

(options.maxTileSize, options.partitionBytes) match {
case (_, Some(partitionBytes)) => {
val windows: RDD[(String, Array[GridBounds])] =
sourceGeoTiffInfo.windowsByBytes(partitionBytes, options.maxTileSize.getOrElse(1<<10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comments about maxTileSize and windowSize and 1 << 10 are valid for Hadoop too.

options: Options
)(implicit sc: SparkContext, rr: RasterReader[Options, (I, V)]): RDD[(K, V)] = {

val conf = new SerializableConfiguration(configuration(path, options))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why conf should be wrapped here? Looks like it's only used for input formats arguments => can be used a common configuration without extra wrapper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That variable makes its way into a sc.parallelize inside of HadoopGeoTiffInfoReader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, my bad.

)(implicit sc: SparkContext, rr: RasterReader[Options, (I, V)]): RDD[(K, V)] = {

val conf = new SerializableConfiguration(configuration(path, options))
val path2 = path.toString
Copy link
Member

@pomadchin pomadchin Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it's always a bit confusing when you see in the code base such variable names. In fact you can write just: // to avoid variable naming confusion

HadoopGeoTiffInfoReader(path.toString, conf, options.tiffExtensions)

val layout = info.getGeoTiffInfo(objectRequest.toString).segmentLayout.tileLayout

RasterReader
.listWindows(cols, rows, options.maxTileSize.getOrElse(1<<10), layout.tileCols, layout.tileRows)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxTileSize comment here too

@jamesmcclain jamesmcclain force-pushed the fix/confirming-windows branch from 80cdadb to 395fe2d Compare October 11, 2017 16:24
@jamesmcclain
Copy link
Member Author

All comments have been addressed, I believe.

@jamesmcclain
Copy link
Member Author

💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants