-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving memory usage for PDF files loaded by chunks #5332
Conversation
@@ -15,13 +15,22 @@ | |||
* limitations under the License. | |||
*/ | |||
/* globals assert, MissingDataException, isInt, NetworkManager, Promise, | |||
isEmptyObj, createPromiseCapability */ | |||
isEmptyObj, createPromiseCapability */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this white-space change.
I've added a few initial comments on the style of the code. When you have addressed those, please squash the commits into one, see https://github.com/mozilla/pdf.js/wiki/Squashing-Commits. |
fa040d3
to
e464033
Compare
I think that it would be a good idea to benchmark the changes in this PR, to make sure that we don't regress performance improvements from PRs such as e.g. #4966. See https://github.com/mozilla/pdf.js/wiki/Benchmarking-your-changes. Edit: According to the above, this PR reduces memory consumption for large files. Does that apply to smaller files, of more "normal" size, too? Or is there no measurable difference for smaller files? |
Yep, I agree about testing performance, I'll take a look, I think there is a lot of room for optimizations. Also note that at the moment there is no memory eviction, mostly because I can't decide what would be a good way to do that. I think eviction behaviour should be configurable, maybe in API.js - something like evict when total memory usage is over X bytes and evict LRU parts. I'd also move chunk size in there. Or would pdfDataRangeTransport be a better place? |
This PR significantly reduces memory usage for partially loaded files - since there is no eviction if file is 100% loaded, memory usage could even be somewhat larger when there is request for data on non-continuous chunks but given how chunks are requested I don't think this is likely. But as long as the file is not completely loaded, memory usage is reduced for pretty much every chunk that is not loaded regardless of file size. It should be pretty easy to add eviction when memory reaches some sort of threshold - say 10 or 50 or 100MB but that might cause reloading of some data if wrong data is evicted. Still, I'd assume this is preferable to browser/tab crash. |
e464033
to
3c1a89d
Compare
Added performance optimizations, I see small performance gains over current master (settings from benchmarking wiki)
|
@bh213 I have added some review comments. The code already looks really good, so these are quite minor comments. It's also great that you have included tests; thanks for that! |
Let's also /cc @fkaelberer, @CodingFabian and @nnethercote for an additional look on this. |
I'll let others review the code, but I did some measurements. On one 19.9 MB file, it reduced peak RSS while loading (from disk) from ~570 MiB to ~550 MiB. A slight improvement, not bad. But on a 203 MB file (the largest I have on my machine) it reduced peak RSS while loading (from disk) from ~700 MiB to ~510 MiB. Nice! My ad hoc JS memory profiler confirms that it replaced a 203 MB allocation in chunked_stream.js with a 10.5 MB allocation. I can imagine it makes a huge difference on a 1.9 GB file. |
3c1a89d
to
81d1fd9
Compare
We just landed stream loading support, which conflicts with this PR (see #5263). Could you rebase this patch on top of the new master? |
b820a26
to
768b9ee
Compare
Review status: 3 of 8 files reviewed at latest revision, 15 unresolved discussions. src/core/chunked_stream.js, line 461 [r1] (raw file):
|
src/core/chunked_stream.js, line 461 [r1] (raw file):
|
Review status: 3 of 8 files reviewed at latest revision, 15 unresolved discussions, some commit checks failed. src/core/chunked_stream.js, line 461 [r1] (raw file):
|
Review status: 3 of 8 files reviewed at latest revision, 15 unresolved discussions, some commit checks failed. src/core/chunked_stream.js, line 39 [r1] (raw file):
|
768b9ee
to
f78ef66
Compare
rebased code to the latest master |
} | ||
}; | ||
|
||
beforeEach(function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to register this before every test, couldn't you do it once instead? E.g.
beforeAll(function (done) {
jasmine.addCustomEqualityTester(arrayComparerThatFixesJasmineIssue786);
done();
});
@Snuffleupagus - updated PR with test review changes |
Anything else I should take a look at? |
/botio-linux preview |
From: Bot.io (Linux)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/960afd1d46ec148/output.txt |
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/960afd1d46ec148/output.txt Total script time: 1.06 mins Published |
/botio test |
From: Bot.io (Linux)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/a37442f3e732033/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://107.22.172.223:8877/44525715f681ae9/output.txt |
From: Bot.io (Windows)SuccessFull output at http://107.22.172.223:8877/44525715f681ae9/output.txt Total script time: 20.78 mins
|
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/a37442f3e732033/output.txt Total script time: 26.56 mins
|
Anything else needed for this PR? |
The server which delivers the PDF has to support Byte serving (Accept-Ranges http header) for this PR to work, correct? |
I don't think I'll be rebasing this PR yet again... |
as described in #5329, this significantly reduces memory usage for large files loaded by chunking
This change is