-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When passed a filename, download a report in chunks #321
When passed a filename, download a report in chunks #321
Conversation
Rather than pulling an entire 6-8 GB report into a string in memory to immediately write it out to a file, use the built-in feature of Net::HTTP to retrieve the result in chunks and write them out as they are read from the remote host. This prevents receiving `NoMemoryError` issues when downloading huge reports.
Ruby 2.5 has been out for some time now, we should make sure tests work.
This adds three different tests for three different ways to call this method.
end | ||
|
||
it 'downloads report with file object' do | ||
tf = Tempfile.new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you have to specify a basename for backwards compatibility: https://blog.bigbinary.com/2017/04/04/ruby-2-4-has-default-basename-for-tempfile-create.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed that change as I noticed the build had failed. All ruby versions green now.
- Security Console | ||
body: | ||
encoding: ASCII-8BIT | ||
string: "<NexposeReport version=\"2.0\"></NexposeReport>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be some dummy data here to ensure there are enough chunks to iterate? Does vcr actually simulate that properly?
This might be getting into "are you just testing the standard library now" territory, and if the answer to that is yes it's not worth worrying about too much as we would catch problems here in our internal automated testing anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut is the latter; I also don't know if VCR would even emulate that the same way an actual Net::HTTP
call would work, or if it is always going to return the data in the yaml file here as a single chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just spent a few minutes diving into the VCR and WebMock source code. I didn't see any obvious way to get this to be a multipart or chunked response in the VCR cassette, so I think we're just going to have to rely on the standard library and/or internal testing here.
I should point out that this is very similar to #248, but I attempted to leave the non-filename path alone as much as possible, and not add any |
This looks good so once I get a chance to test this myself I'll probably merge and release. Thanks for doing this since I completely forgot about the past attempt. 😓 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worked as expected with a 400 MB PDF report.
Version 7.2.1 is published to RubyGems. |
Thank you, appreciate you getting this reviewed and merged so quickly! |
Just a note for anyone coming here later to find out if this change broke their scripts: It seems like tempfile handling changed between Ruby 2.1.x and 2.2.x versions. If you use a tempfile and do not keep the file reference open after the report is downloaded it seems to get garbage collected immediately after the report completes (likely due to the scope change on the file block) which deletes the temp file from disk. As far as I can tell, when this happens, it means the tempfile is being used incorrectly (e.g. passing the path around instead of the file object). To avoid this just make sure any tempfile usage keeps the file object in scope until it's no longer needed. For example, this won't work because the file won't exist when you try to open it: report_file = Tempfile.new('report').path
nsc.download(report.uri, report_file)
File.open(report_file, 'rb') do |f|
# do stuff with the file
end Correct usage: report_file = Tempfile.new('report')
nsc.download(report.uri, report_file) # you can also use report_file.path but it's redundant
File.open(report_file, 'rb') do |f|
# do stuff with the file
end |
Description
Rather than pulling an entire 6-8 GB report into a string in memory to
immediately write it out to a file, use the built-in feature of Net::HTTP to
retrieve the result in chunks and write them out as they are read from the
remote host. This prevents receiving
NoMemoryError
issues when downloadinghuge reports.
While here, also add Travis CI testing for Ruby 2.5.
Motivation and Context
We're hitting out of memory errors trying to download huge reports using this gem.
How Has This Been Tested?
We have a set of VCR-based tests that invoke this gem, and specifically the
download
method. I was able to use this version of thedownload
method and see our test suite pass successfully.Unfortunately, we don't have tests for this method in this gem itself. It probably makes sense to do some sort of VCR recording or cassette, which I could attempt if you think that makes sense.EDIT: tests now included.Types of changes
Checklist: