-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue with resource management in Jpeg compressed GeoTiffs #3249
Fix issue with resource management in Jpeg compressed GeoTiffs #3249
Conversation
42876ec
to
36f13c6
Compare
|
||
val parList = (1 to 10000).toList.par | ||
// TODO: Replace with java.util.concurrent.ForkJoinPool once we drop 2.11 support. | ||
val forkJoinPool = new scala.concurrent.forkjoin.ForkJoinPool(50) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that using java.util.concurrent.ForkJoinPool
causes Scala 2.11 to throw a compiler error. SemVer!
The unit test does not seem to be failing on Circle CI. I'm think this may be due to the relative lack of CPU power on the CI machine to cause enough of the reads to happen in parallel so that it triggers the 'too many files open' failure. Looking for someone to verify that the test in 36f13c6 fails on their machine. |
The unit test downloads a larger file (44MB) and takes about 50 second to run on my machine. Do we think that it should be marked as I tried with a smaller GeoTiff that was already contained in the repository, but was unable to trigger the failure with that GeoTiff. |
36f13c6 fails on my machine 👍 |
Another reason why CircleCI is happy is that it has more file descriptors available, and the test should be more agressive to make it fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that this PR fixes a bug and after applying it on my machine the test passes.
It looks like CircleCI has more file descriptors opened so to catch it.
I think we can merge it as is (after fixing package name and a filename). It looks like it didn't drammatically increase the CI runtime.
Thanks Rob 🎉
@@ -0,0 +1,74 @@ | |||
import geotrellis.raster._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a package name for the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 I guess it's been that long since I've done Scala development....
import sys.process._ | ||
|
||
|
||
class JPEGLoadGeoTiffReaderSpec extends FunSpec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be aligned with the file name in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
override def afterAll = purge | ||
|
||
describe("Reading GeoTiffs with JPEG compression") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about adding a comment somewhere with a link to this issue, so we'll know the context about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
This unit test downloads a JPEG-compressed GeoTiff from OpenAerialMap. It then performs many parallel reads in a loop, causing a "too many files opened" exception. Signed-off-by: Rob Emanuele <[email protected]>
Before this change, JpegDecompressor held onto an instance of a ImageReader, and used setInput to set the segment bytes for individual segment read. This method was not cleaning up any of the resources used, which caused issues where too many files were being opened up during parallelized reads. This change creates a new ImageReader per segment read, and cleans up any InputStream and ImageReader resources after each read. Signed-off-by: Rob Emanuele <[email protected]>
Signed-off-by: Rob Emanuele <[email protected]>
bff27a9
to
c5bca8a
Compare
@pomadchin pushed rebased changes to the unit test commit to address your comments. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks! 🚀 Merging once CircleCI is completely green.
Overview
This PR fixes an issue with resource handling in JPEG compressed GeoTiffs.
Reading JPEGs with ImageIO is known to cause issues. The way the JPEG decompression works does not close down resources sufficiently and can cause an error like this
The first commit in this PR adds a unit test downloads a JPEG-compressed GeoTiff from OpenAerialMap. It then performs many parallel reads in a loop, causing a "too many files opened" exception with the current code.
The second commit modifies the
JpegDecompression
to always create and close theImageReader
resources as needed. This may cause the code to run slower, though I will note that the unit test running against a Deflate version of the GeoTiff ran successfully in 50 seconds, and it running against a (pre-downloaded) JPEG compressed version with the fix ran in 55 seconds.Checklist
Module Hierarcy updated, if necessarydocs
guides update, if necessaryNew user API has useful Scaladoc stringsRelated to raster-foundry/raster-foundry#4658