Improving memory usage for PDF files loaded by chunks #5332

bh213 · 2014-09-23T14:40:01Z

as described in #5329, this significantly reduces memory usage for large files loaded by chunking

This change is

Snuffleupagus · 2014-09-23T14:52:44Z

src/core/chunked_stream.js

@@ -15,13 +15,22 @@
 * limitations under the License.
 */
 /* globals assert, MissingDataException, isInt, NetworkManager, Promise,
-           isEmptyObj, createPromiseCapability */
+ isEmptyObj, createPromiseCapability */


Please revert this white-space change.

Snuffleupagus · 2014-09-23T15:10:35Z

I've added a few initial comments on the style of the code. When you have addressed those, please squash the commits into one, see https://github.com/mozilla/pdf.js/wiki/Squashing-Commits.

Snuffleupagus · 2014-09-24T11:32:30Z

I think that it would be a good idea to benchmark the changes in this PR, to make sure that we don't regress performance improvements from PRs such as e.g. #4966. See https://github.com/mozilla/pdf.js/wiki/Benchmarking-your-changes.

Edit: According to the above, this PR reduces memory consumption for large files. Does that apply to smaller files, of more "normal" size, too? Or is there no measurable difference for smaller files?

bh213 · 2014-09-24T11:45:08Z

Yep, I agree about testing performance, I'll take a look, I think there is a lot of room for optimizations.

Also note that at the moment there is no memory eviction, mostly because I can't decide what would be a good way to do that. I think eviction behaviour should be configurable, maybe in API.js - something like evict when total memory usage is over X bytes and evict LRU parts. I'd also move chunk size in there. Or would pdfDataRangeTransport be a better place?

bh213 · 2014-09-25T10:24:47Z

This PR significantly reduces memory usage for partially loaded files - since there is no eviction if file is 100% loaded, memory usage could even be somewhat larger when there is request for data on non-continuous chunks but given how chunks are requested I don't think this is likely. But as long as the file is not completely loaded, memory usage is reduced for pretty much every chunk that is not loaded regardless of file size.

It should be pretty easy to add eviction when memory reaches some sort of threshold - say 10 or 50 or 100MB but that might cause reloading of some data if wrong data is evicted. Still, I'd assume this is preferable to browser/tab crash.

bh213 · 2014-09-25T14:50:47Z

Added performance optimizations, I see small performance gains over current master (settings from benchmarking wiki)

browser stat Count Baseline(ms) Current(ms) +/- % Result(P<.05)

firefox Overall 100 141 135 -6 -4.07

firefox Page Request 100 5 5 -1 -12.05

firefox Rendering 100 135 130 -5 -3.76

timvandermeij · 2014-09-25T17:38:11Z

@bh213 I have added some review comments. The code already looks really good, so these are quite minor comments. It's also great that you have included tests; thanks for that!

timvandermeij · 2014-09-25T17:40:10Z

Let's also /cc @fkaelberer, @CodingFabian and @nnethercote for an additional look on this.

nnethercote · 2014-09-25T23:16:42Z

I'll let others review the code, but I did some measurements.

On one 19.9 MB file, it reduced peak RSS while loading (from disk) from ~570 MiB to ~550 MiB. A slight improvement, not bad.

But on a 203 MB file (the largest I have on my machine) it reduced peak RSS while loading (from disk) from ~700 MiB to ~510 MiB. Nice! My ad hoc JS memory profiler confirms that it replaced a 203 MB allocation in chunked_stream.js with a 10.5 MB allocation.

I can imagine it makes a huge difference on a 1.9 GB file.

yurydelendik · 2014-09-26T12:42:40Z

We just landed stream loading support, which conflicts with this PR (see #5263). Could you rebase this patch on top of the new master?

bh213 · 2016-05-19T15:07:24Z

Review status: 3 of 8 files reviewed at latest revision, 15 unresolved discussions.

src/core/chunked_stream.js, line 461 [r1] (raw file):

Previously, yurydelendik (Yury Delendik) wrote…

Logic looks somewhat similar to onReceiveProgressiveData, can we merge into one function? Before onReceiveData could not handle incomplete and "uneven" chunks, now it looks like it can.

onReceiveData depends on chunks being aligned since loaded state is stored in this.loadedChunks. Having unaligned chunks would be a lot more work.

Comments from Reviewable

yurydelendik · 2016-05-19T15:18:05Z

src/core/chunked_stream.js, line 461 [r1] (raw file):

Previously, bh213 (Gorazd Breskvar) wrote…

onReceiveData depends on chunks being aligned since loaded state is stored in this.loadedChunks. Having unaligned chunks would be a lot more work.

By reading the code, that's what you are trying to do; if not, then I don't understand the intent. The current implementation is able to handle incomplete chunks from progressive data, and range requests chunks (which are always aligned), also both processes can happen simultaneously. Is new implementation can handle only progressive data and chunked data at one time?

Comments from Reviewable

bh213 · 2016-05-20T09:03:50Z

Review status: 3 of 8 files reviewed at latest revision, 15 unresolved discussions, some commit checks failed.

src/core/chunked_stream.js, line 461 [r1] (raw file):

Previously, yurydelendik (Yury Delendik) wrote…

By reading the code, that's what you are trying to do; if not, then I don't understand the intent. The current implementation is able to handle incomplete chunks from progressive data, and range requests chunks (which are always aligned), also both processes can happen simultaneously. Is new implementation can handle only progressive data and chunked data at one time?

I rechecked the code - what happens in onReceiveData (continuous mode) is that there is special handling for initial chunk (this was probably in original code too) so less than chunk size can be accepted and after the initial chunk it expects aligned and full chunk data only.

Both progressive and chunked data can be active at the same time, I added tests to make sure.

Comments from Reviewable

bh213 · 2016-05-20T09:05:22Z

Review status: 3 of 8 files reviewed at latest revision, 15 unresolved discussions, some commit checks failed.

src/core/chunked_stream.js, line 39 [r1] (raw file):

Previously, Snuffleupagus (Jonas Jenwald) wrote…

Nit: incorrect comment style, please use // instead. And please remember to always add a period at the end of sentences (applies elsewhere in this patch as well). E.g. like this:
// If the length of stream is less than ALLOCATE_NO_CHUNKS_SIZE then use

// ChunkedStreamContinuous which works with single block of memory.

Done.

src/core/chunked_stream.js, line 59 [r1] (raw file):

Previously, yurydelendik (Yury Delendik) wrote…

all three above are not used in the base and continuous stream (move into fragmented only?)

Done.

src/core/chunked_stream.js, line 95 [r1] (raw file):

Previously, yurydelendik (Yury Delendik) wrote…

add "abstract" method for prepareBuffer.

Done.

src/core/chunked_stream.js, line 139 [r1] (raw file):

Previously, yurydelendik (Yury Delendik) wrote…

we are using this.buffer below, can you instantiate it in the constructor above with e.g. 'null' value (to know it will be there).

Done.

src/core/chunked_stream.js, line 264 [r1] (raw file):

Previously, yurydelendik (Yury Delendik) wrote…

Add "abstract" createGetByteFast method. btw, you can call it from "subStream." and update "getByte" in this method (probably name would be ensureGetByteFast)

Done.

src/core/chunked_stream.js, line 280 [r1] (raw file):

Previously, Snuffleupagus (Jonas Jenwald) wrote…

Similar to https://github.com/mozilla/pdf.js/pull/5332/files#r60896625.

Done.

src/core/chunked_stream.js, line 297 [r1] (raw file):

Previously, yurydelendik (Yury Delendik) wrote…

nit: ChunkedStreamContinuous_createGetByteFast

Done.

src/core/chunked_stream.js, line 303 [r1] (raw file):

Previously, yurydelendik (Yury Delendik) wrote…

this function body can be placed inside createGetByteFast/ensureGetByteFast

Done.

src/core/chunked_stream.js, line 327 [r1] (raw file):

Previously, yurydelendik (Yury Delendik) wrote…

not sure what this calculation do, shall it be just position?

Done.

src/core/chunked_stream.js, line 469 [r1] (raw file):

Previously, Snuffleupagus (Jonas Jenwald) wrote…

Similar to https://github.com/mozilla/pdf.js/pull/5332/files#r60896625

Done.

Comments from Reviewable

bh213 · 2016-05-20T09:19:38Z

rebased code to the latest master

Snuffleupagus · 2016-05-20T10:36:11Z

test/unit/chunkedstream_spec.js

+    }
+  };
+
+  beforeEach(function() {


Is it necessary to register this before every test, couldn't you do it once instead? E.g.

beforeAll(function (done) { jasmine.addCustomEqualityTester(arrayComparerThatFixesJasmineIssue786); done(); });

bh213 · 2016-05-23T10:16:04Z

@Snuffleupagus - updated PR with test review changes

bh213 · 2016-05-27T11:14:44Z

Anything else I should take a look at?

timvandermeij · 2016-05-27T20:08:13Z

/botio-linux preview

pdfjsbot · 2016-05-27T20:08:13Z

From: Bot.io (Linux)

Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/960afd1d46ec148/output.txt

pdfjsbot · 2016-05-27T20:09:17Z

From: Bot.io (Linux)

Success

Full output at http://107.21.233.14:8877/960afd1d46ec148/output.txt

Total script time: 1.06 mins

Published

timvandermeij · 2016-05-27T20:11:12Z

/botio test

pdfjsbot · 2016-05-27T20:11:12Z

From: Bot.io (Linux)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/a37442f3e732033/output.txt

pdfjsbot · 2016-05-27T20:11:12Z

From: Bot.io (Windows)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/44525715f681ae9/output.txt

pdfjsbot · 2016-05-27T20:32:00Z

From: Bot.io (Windows)

Success

Full output at http://107.22.172.223:8877/44525715f681ae9/output.txt

Total script time: 20.78 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

pdfjsbot · 2016-05-27T20:37:46Z

From: Bot.io (Linux)

Success

Full output at http://107.21.233.14:8877/a37442f3e732033/output.txt

Total script time: 26.56 mins

Font tests: Passed
Unit tests: Passed
Regression tests: Passed

bh213 · 2016-06-14T10:12:16Z

Anything else needed for this PR?

frankmehlhose · 2016-09-12T07:05:02Z

The server which delivers the PDF has to support Byte serving (Accept-Ranges http header) for this PR to work, correct?
We have a use case with PDF/A documents for digital archive systems.
In extreme cases we are facing documents with up to 2000 pages where each page contains scanned image data and is about 1MB in size.
In such cases, an eviction policy for the chunk cache might be a good idea. Maybe with LRU eviction algorithm and configurable cache size.

bh213 · 2017-06-22T11:52:26Z

I don't think I'll be rebasing this PR yet again...

Snuffleupagus reviewed Sep 23, 2014
View reviewed changes

Snuffleupagus added the core label Sep 23, 2014

bh213 force-pushed the chunkedstream_with_chunked_memory branch 2 times, most recently from fa040d3 to e464033 Compare September 24, 2014 10:56

bh213 force-pushed the chunkedstream_with_chunked_memory branch from e464033 to 3c1a89d Compare September 25, 2014 14:41

bh213 force-pushed the chunkedstream_with_chunked_memory branch from 3c1a89d to 81d1fd9 Compare September 26, 2014 08:53

bh213 force-pushed the chunkedstream_with_chunked_memory branch from b820a26 to 768b9ee Compare May 13, 2016 12:07

bh213 added 2 commits May 20, 2016 11:13

chunked streams with fragmented and continuous memory support

a217044

after code review (will be squashed later)

f78ef66

bh213 force-pushed the chunkedstream_with_chunked_memory branch from 768b9ee to f78ef66 Compare May 20, 2016 09:18

Snuffleupagus reviewed May 20, 2016
View reviewed changes

fixing spec after review (will be squashed later)

4749fb0

bh213 closed this Jun 22, 2017

bh213 deleted the chunkedstream_with_chunked_memory branch January 11, 2018 11:25

bh213 restored the chunkedstream_with_chunked_memory branch January 11, 2018 11:31

bh213 deleted the chunkedstream_with_chunked_memory branch January 11, 2018 11:31

bh213 restored the chunkedstream_with_chunked_memory branch February 15, 2021 13:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving memory usage for PDF files loaded by chunks #5332

Improving memory usage for PDF files loaded by chunks #5332

bh213 commented Sep 23, 2014

Snuffleupagus Sep 23, 2014

Snuffleupagus commented Sep 23, 2014

Snuffleupagus commented Sep 24, 2014 •

edited by timvandermeij

Loading

bh213 commented Sep 24, 2014

bh213 commented Sep 25, 2014

bh213 commented Sep 25, 2014

timvandermeij commented Sep 25, 2014

timvandermeij commented Sep 25, 2014

nnethercote commented Sep 25, 2014

yurydelendik commented Sep 26, 2014

bh213 commented May 19, 2016

yurydelendik commented May 19, 2016

bh213 commented May 20, 2016

bh213 commented May 20, 2016

bh213 commented May 20, 2016

Snuffleupagus May 20, 2016

bh213 commented May 23, 2016

bh213 commented May 27, 2016

timvandermeij commented May 27, 2016

pdfjsbot commented May 27, 2016

pdfjsbot commented May 27, 2016

timvandermeij commented May 27, 2016

pdfjsbot commented May 27, 2016

pdfjsbot commented May 27, 2016

pdfjsbot commented May 27, 2016

pdfjsbot commented May 27, 2016

bh213 commented Jun 14, 2016

frankmehlhose commented Sep 12, 2016

bh213 commented Jun 22, 2017

Improving memory usage for PDF files loaded by chunks #5332

Improving memory usage for PDF files loaded by chunks #5332

Conversation

bh213 commented Sep 23, 2014

Snuffleupagus Sep 23, 2014

Choose a reason for hiding this comment

Snuffleupagus commented Sep 23, 2014

Snuffleupagus commented Sep 24, 2014 • edited by timvandermeij Loading

bh213 commented Sep 24, 2014

bh213 commented Sep 25, 2014

bh213 commented Sep 25, 2014

timvandermeij commented Sep 25, 2014

timvandermeij commented Sep 25, 2014

nnethercote commented Sep 25, 2014

yurydelendik commented Sep 26, 2014

bh213 commented May 19, 2016

yurydelendik commented May 19, 2016

bh213 commented May 20, 2016

bh213 commented May 20, 2016

bh213 commented May 20, 2016

Snuffleupagus May 20, 2016

Choose a reason for hiding this comment

bh213 commented May 23, 2016

bh213 commented May 27, 2016

timvandermeij commented May 27, 2016

pdfjsbot commented May 27, 2016

From: Bot.io (Linux)

Received

pdfjsbot commented May 27, 2016

From: Bot.io (Linux)

Success

Published

timvandermeij commented May 27, 2016

pdfjsbot commented May 27, 2016

From: Bot.io (Linux)

Received

pdfjsbot commented May 27, 2016

From: Bot.io (Windows)

Received

pdfjsbot commented May 27, 2016

From: Bot.io (Windows)

Success

pdfjsbot commented May 27, 2016

From: Bot.io (Linux)

Success

bh213 commented Jun 14, 2016

frankmehlhose commented Sep 12, 2016

bh213 commented Jun 22, 2017

Snuffleupagus commented Sep 24, 2014 •

edited by timvandermeij

Loading