Less copying in the JPX decoder #4538

fkaelberer · 2014-03-31T09:58:48Z

The goal of this PR is to reduce the copy operations during JPX decoding.

In the existing code, JPX image are decoded to individual UInt8Arrays for each tile and each color channel in JPX.js, and then joined to a single (rgb) data array in stream.js. This PR now creates a single rgb array for each tile, so merging those in streams.js is simplified, and can be even avoided if only one tile is present (which is the case for example in #2790 and #3181, so it should save time and memory).

I also refactored the shifting and clamping in the JPX code (so this PR supersedes #4531).
Before, there where two loops: one loop for shifting in the unsigned case (L1065), and one loop that shifted in the signed case and clamped to UInt8 (L1080).
I avoided those loops and instead used a shift-and-clamp function when the data is written in the first place.

This patch will not pass the equality test of S2.pdf, because there is a slightly different rounding behavior.
You can make it pass the test by changing line 1024 from
return (value <= -128) ? 0 : ((value >= 127) ? 255 : (value + 128) | 0);
return (value <= -128) ? 0 : ((value >= 127) ? 255 : new Float32Array([value + 128])[0] | 0);
The s2.pdf example contains values that are so close to an integer that truncating to 32 bit precision does make a difference, e.g. (new Float32Array([119.99999237060547 + 128])[0] | 0) != ((119.99999237060547 + 128) | 0).
In the old code, the truncation to 32 bit precision happened in line 1075, and so 119.99..+128 was rounded up, whereas now it is rounded down.

When I tried to figure out which of the two possibilities is the correct behavior, I noticed that the jj2000 java reference code produces neither of them: the floating point values differ up to +-4 (in the [-128, 128] range), so if a handful of 119.99-pixels is rounded up or down shouldn't matter here, given that all the other pixels might be off by up to 4. (EDIT: I opened issue #4540 for that).

fkaelberer · 2014-03-31T10:12:44Z

Sorry, I forgot to lint. Is my use of the switch statement discouraged?

Snuffleupagus · 2014-03-31T10:23:35Z

Is my use of the switch statement discouraged?

Just add /* falls through */ instead of break;, and you should be good to go!

fkaelberer · 2014-03-31T10:40:33Z

Ok, thanks.

p01 · 2014-03-31T12:05:18Z

I started to look at jpx.js too.
I agree, the interleaving of the tileComponents does not belong in stream.js.

timvandermeij · 2014-03-31T17:26:46Z

/botio test

pdfjsbot · 2014-03-31T17:26:46Z

From: Bot.io (Linux)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/b0e205621fc30c5/output.txt

pdfjsbot · 2014-03-31T17:26:46Z

From: Bot.io (Windows)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/b7d2be751042d18/output.txt

pdfjsbot · 2014-03-31T17:29:17Z

From: Bot.io (Windows)

Failed

Full output at http://107.22.172.223:8877/b7d2be751042d18/output.txt

Total script time: 2.51 mins

Font tests: FAILED
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/b7d2be751042d18/reftest-analyzer.html#web=eq.log

timvandermeij · 2014-03-31T17:31:59Z

/botio-windows test

pdfjsbot · 2014-03-31T17:32:02Z

From: Bot.io (Windows)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/346b31481483d1a/output.txt

pdfjsbot · 2014-03-31T17:52:24Z

From: Bot.io (Linux)

Failed

Full output at http://107.21.233.14:8877/b0e205621fc30c5/output.txt

Total script time: 25.63 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/b0e205621fc30c5/reftest-analyzer.html#web=eq.log

pdfjsbot · 2014-03-31T17:53:41Z

From: Bot.io (Windows)

Failed

Full output at http://107.22.172.223:8877/346b31481483d1a/output.txt

Total script time: 21.63 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/346b31481483d1a/reftest-analyzer.html#web=eq.log

fkaelberer · 2014-04-01T13:57:53Z

I added a fix for #4540.
In the example of #4540, the output of copyCoefficients() now matches the jj2000 coefficients up to some reasonable float precision.
Pixels still differ from Irfanview output, but never more than 1/255.

Snuffleupagus · 2014-04-01T14:16:20Z

/botio test

pdfjsbot · 2014-04-01T14:16:21Z

From: Bot.io (Linux)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/c5c61d2a3d64f4d/output.txt

pdfjsbot · 2014-04-01T14:16:21Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/6a5aa1af7e30013/output.txt

pdfjsbot · 2014-04-01T14:18:52Z

From: Bot.io (Windows)

Failed

Full output at http://107.22.172.223:8877/6a5aa1af7e30013/output.txt

Total script time: 2.51 mins

Font tests: FAILED
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/6a5aa1af7e30013/reftest-analyzer.html#web=eq.log

yurydelendik · 2014-04-01T14:25:48Z

/botio-windows test

pdfjsbot · 2014-04-01T14:25:50Z

From: Bot.io (Windows)

Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/de08d2879653fe8/output.txt

pdfjsbot · 2014-04-01T14:41:47Z

From: Bot.io (Linux)

Failed

Full output at http://107.21.233.14:8877/c5c61d2a3d64f4d/output.txt

Total script time: 25.44 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/c5c61d2a3d64f4d/reftest-analyzer.html#web=eq.log

pdfjsbot · 2014-04-01T14:48:06Z

From: Bot.io (Windows)

Failed

Full output at http://107.22.172.223:8877/de08d2879653fe8/output.txt

Total script time: 22.25 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/de08d2879653fe8/reftest-analyzer.html#web=eq.log

fkaelberer · 2014-04-02T10:47:55Z

Hmm... this patch is supposed to reduce mem consumption. However, I measured a significant increase of > 100MB. Does anybody have an idea why?

Difference of the snapshots:

147.78 MB (100.0%) -- explicit
├──145.16 MB (98.23%) -- workers
│  ├──145.16 MB (98.23%) -- workers(pdf.js)/worker(../build/pdf.worker.js, 0xNNN)
│  │  ├──210.01 MB (142.11%) -- zone(0x7fafa5608000)
│  │  │  ├──208.47 MB (141.07%) -- compartment(web-worker)
│  │  │  │  ├──206.25 MB (139.56%) -- objects
│  │  │  │  │  ├──205.59 MB (139.12%) -- malloc-heap
│  │  │  │  │  │  ├──205.41 MB (138.99%) ── elements/non-asm.js
│  │  │  │  │  │  └────0.18 MB (00.12%) ── slots [+]
│  │  │  │  │  └────0.66 MB (00.44%) ++ gc-heap
│  │  │  │  ├────1.76 MB (01.19%) ++ shapes
│  │  │  │  └────0.47 MB (00.32%) ++ (3 tiny)
│  │  │  └────1.54 MB (01.04%) ++ (8 tiny)
│  │  └──-64.85 MB (-43.88%) ++ (7 tiny)
│  └────0.00 MB (00.00%) ++ workers()
└────2.62 MB (01.77%) ++ (9 tiny)

EDIT: the measurement was made with the first page of #2790, which has only one tile (so that the copy process in stream.js is skipped), and uses multi-component transform.

fkaelberer · 2014-04-02T11:51:50Z

Let me be more precise on the measurements:
For the above mem snapshots I used aurora28.0a2 (2014-01-13).
The current nightly behaves completely different.

This time, I created a fresh profile and I got the following peak RSS using memusg. Each measurement was repeated 5-8 times. In both cases, the new patch delivered much more consistent results.

browser	master (0.8.1334)	lessCopyingInJPX, rebased
aurora28.0a2 (2014-01-13)	335 - 367MB	456 - 460MB
nightly31.0a1 (2014-04-01	423 - 499MB	469 - 475MB

Environment: Linux Mint Ubuntu-64Bit in a Windows VM.

fkaelberer · 2014-04-04T19:04:11Z

Hmmm... I continued to reduce memory allocations, and the peak memory keeps going up.
The last commit avoids the creation of a large arrays, but the memory of nightly keeps peaking at 460MB-490MB.

When the jpeg2000 algorithm decodes the image from coarse to fine, it used to first create
four coefficient arrays [a1, a2, a3, a4], ..., [d1, d2, d3, d4], which are then interleaved to [a1, b1, c1, d1, a2, ..., d4] in Transform.iterate(). This patch skips the creation of the four arrays and writes the coefficients directly to the interleaved array.
I thought this would matter, since the arrays are quite big: Without the patch, the following arrays hat to be stored in the final image composition phase: the coefficient arrays (1 Float32/pixel), interleaved coefficient array (1 Float32/pixel and component), output arrays (1 Byte/pixel and component). That's (4 + 5 * #comps Bytes per pixel, ie, 228 MBytes for my 12Mpixel test case. This patch gets rid of the "4" in the formula). The amount of total Float allocations is reduced from 330MB to 190MB).

While the numbers sound good in theory, I wonder why they don't help in practice. Maybe I should try another test file :-].

fkaelberer · 2014-04-05T16:54:42Z

Blocked by #4565.

fkaelberer · 2014-04-09T13:02:51Z

Waiting for #4568 to land.

p01 · 2014-04-10T11:32:24Z

Good to go! #4568 is in.

timvandermeij · 2014-04-10T19:20:31Z

@fkaelberer Could you rebase this PR?

fkaelberer · 2014-04-12T19:22:53Z

It is now rebased and merged with p01's changes.

I changed the code for shifting and clamping once more from
data[j] = val <= low ? 0 : val >= high ? 255 : (val >> shift) + 128 to
data[j] = val <= low ? 0 : val >= high ? 255 : (val + offset) >> shift
due to accuracy issues.

val is the result from the component transform and has +0.5 built-in, and it is supposed to be rounded down. However, val >> shift rounds val up (towards 0), if val is negative.
By adding the offset before the shift, val + offset is always rounded down.
This corrects the colors in this example, which has background color #10101.

pdfjsbot · 2014-04-14T20:53:56Z

From: Bot.io (Windows)

Failed

Full output at http://107.22.172.223:8877/4200d9dc57a21f9/output.txt

Total script time: 19.89 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/4200d9dc57a21f9/reftest-analyzer.html#web=eq.log

fkaelberer · 2014-04-15T10:26:26Z

Any ideas what's wrong?

Btw, the memory performance inconsistencies suddenly resolved themselves after a little refactoring. Peak RSS memory is now down from ~355MB to ~333MB using goodytwoshoes00newyiala.pdf, and performance is also better now (Maybe for the same reason as in #3554?).

(I could not test with the pdf of #2790 again, because the current Firefox nightly 2014-04-14 crashes when opening that pdf, with or without this patch.)

fkaelberer · 2014-04-15T11:04:37Z

Filed an upstream bug, https://bugzilla.mozilla.org/show_bug.cgi?id=996537

Snuffleupagus · 2014-04-15T11:05:15Z

Any ideas what's wrong?

Since it's a timeout, probably just some intermittent issue. I'll re-trigger the test.

(I could not test with the pdf of #2790 again, because the current Firefox nightly 2014-04-14 crashes when opening that pdf, with or without this patch.)

For me, the crash report indicates that this is https://bugzilla.mozilla.org/show_bug.cgi?id=995982.

/botio-windows test

pdfjsbot · 2014-04-15T11:05:17Z

From: Bot.io (Windows)

Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/fe6d6e7806b31dc/output.txt

pdfjsbot · 2014-04-15T11:25:03Z

From: Bot.io (Windows)

Failed

Full output at http://107.22.172.223:8877/fe6d6e7806b31dc/output.txt

Total script time: 19.74 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/fe6d6e7806b31dc/reftest-analyzer.html#web=eq.log

Snuffleupagus · 2014-04-15T11:27:19Z

Hmm, the Firefox tests still time-outs. Unfortunately I have no idea why.

yurydelendik · 2014-04-15T15:44:15Z

/botio-windows test

pdfjsbot · 2014-04-15T15:44:15Z

From: Bot.io (Windows)

Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/dcbf24f74d0fa37/output.txt

pdfjsbot · 2014-04-15T16:07:12Z

From: Bot.io (Windows)

Failed

Full output at http://107.22.172.223:8877/dcbf24f74d0fa37/output.txt

Total script time: 22.93 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/dcbf24f74d0fa37/reftest-analyzer.html#web=eq.log

yurydelendik · 2014-04-15T18:47:05Z

This time it was fine, probably intermittent, reviewing...

timvandermeij · 2014-04-15T18:57:03Z

@fkaelberer Also needs a rebase again.

yurydelendik · 2014-04-15T18:58:17Z

src/core/jpx.js

+    var height = subband.tby1 - subband.tby0;
+    var codeblocks = subband.codeblocks;
+    var right = subband.type == 'HL' || subband.type == 'HH' ? 1 : 0;
+    var bottom = subband.type == 'LH' || subband.type == 'HH' ? levelWidth : 0;


can we use === here?

yurydelendik · 2014-04-15T22:45:20Z

Logic looks good. Good to go with rebase and nits above addressed.

yurydelendik · 2014-04-16T12:35:51Z

/botio test

pdfjsbot · 2014-04-16T12:35:51Z

From: Bot.io (Linux)

Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.21.233.14:8877/8d35cf2a1702c97/output.txt

pdfjsbot · 2014-04-16T12:35:51Z

From: Bot.io (Windows)

Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/6be4c9700e4b9f0/output.txt

pdfjsbot · 2014-04-16T12:58:12Z

From: Bot.io (Windows)

Failed

Full output at http://107.22.172.223:8877/6be4c9700e4b9f0/output.txt

Total script time: 22.33 mins

Font tests: FAILED
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/6be4c9700e4b9f0/reftest-analyzer.html#web=eq.log

pdfjsbot · 2014-04-16T13:01:48Z

From: Bot.io (Linux)

Failed

Full output at http://107.21.233.14:8877/8d35cf2a1702c97/output.txt

Total script time: 25.95 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/8d35cf2a1702c97/reftest-analyzer.html#web=eq.log

yurydelendik · 2014-04-16T13:23:25Z

/botio makeref

pdfjsbot · 2014-04-16T13:23:25Z

From: Bot.io (Windows)

Received

Command cmd_makeref from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/32c9240530c5d54/output.txt

pdfjsbot · 2014-04-16T13:23:25Z

From: Bot.io (Linux)

Received

Command cmd_makeref from @yurydelendik received. Current queue size: 0

Live output at: http://107.21.233.14:8877/e2126d58bcd9b07/output.txt

pdfjsbot · 2014-04-16T13:47:02Z

From: Bot.io (Windows)

Success

Full output at http://107.22.172.223:8877/32c9240530c5d54/output.txt

Total script time: 23.58 mins

Lint: Passed
Make references: Passed
Check references: Passed

pdfjsbot · 2014-04-16T13:49:21Z

From: Bot.io (Linux)

Success

Full output at http://107.21.233.14:8877/e2126d58bcd9b07/output.txt

Total script time: 25.91 mins

Lint: Passed
Make references: Passed
Check references: Passed

Less copying in the JPX decoder

yurydelendik · 2014-04-16T13:53:09Z

Thank you for the patch.

fkaelberer mentioned this pull request Apr 5, 2014

Read color info from JPX stream + fix color problem #4540 #4565

Merged

yurydelendik reviewed Apr 15, 2014
View reviewed changes

Less copying in the JPX coder, merged and rebased

04602c8

yurydelendik added a commit that referenced this pull request Apr 16, 2014

Merge pull request #4538 from fkaelberer/lessCopyingInJPX

dea4bda

Less copying in the JPX decoder

yurydelendik merged commit dea4bda into mozilla:master Apr 16, 2014

fkaelberer deleted the lessCopyingInJPX branch April 16, 2014 13:55

Less copying in the JPX decoder #4538

Less copying in the JPX decoder #4538

Conversation

fkaelberer commented Mar 31, 2014

fkaelberer commented Mar 31, 2014

Snuffleupagus commented Mar 31, 2014

fkaelberer commented Mar 31, 2014

p01 commented Mar 31, 2014

timvandermeij commented Mar 31, 2014

pdfjsbot commented Mar 31, 2014

From: Bot.io (Linux)

Received

pdfjsbot commented Mar 31, 2014

From: Bot.io (Windows)

Received

pdfjsbot commented Mar 31, 2014

From: Bot.io (Windows)

Failed

timvandermeij commented Mar 31, 2014

pdfjsbot commented Mar 31, 2014

From: Bot.io (Windows)

Received

pdfjsbot commented Mar 31, 2014

From: Bot.io (Linux)

Failed

pdfjsbot commented Mar 31, 2014

From: Bot.io (Windows)

Failed

fkaelberer commented Apr 1, 2014

Snuffleupagus commented Apr 1, 2014

pdfjsbot commented Apr 1, 2014

From: Bot.io (Linux)

Received

pdfjsbot commented Apr 1, 2014

From: Bot.io (Windows)

Received

pdfjsbot commented Apr 1, 2014

From: Bot.io (Windows)

Failed

yurydelendik commented Apr 1, 2014

pdfjsbot commented Apr 1, 2014

From: Bot.io (Windows)

Received

pdfjsbot commented Apr 1, 2014

From: Bot.io (Linux)

Failed

pdfjsbot commented Apr 1, 2014

From: Bot.io (Windows)

Failed

fkaelberer commented Apr 2, 2014

fkaelberer commented Apr 2, 2014

fkaelberer commented Apr 4, 2014

fkaelberer commented Apr 5, 2014

fkaelberer commented Apr 9, 2014

p01 commented Apr 10, 2014

timvandermeij commented Apr 10, 2014

fkaelberer commented Apr 12, 2014

pdfjsbot commented Apr 14, 2014

From: Bot.io (Windows)

Failed

fkaelberer commented Apr 15, 2014

fkaelberer commented Apr 15, 2014

Snuffleupagus commented Apr 15, 2014

pdfjsbot commented Apr 15, 2014

From: Bot.io (Windows)

Received

pdfjsbot commented Apr 15, 2014

From: Bot.io (Windows)

Failed

Snuffleupagus commented Apr 15, 2014

yurydelendik commented Apr 15, 2014

pdfjsbot commented Apr 15, 2014

From: Bot.io (Windows)

Received

pdfjsbot commented Apr 15, 2014

From: Bot.io (Windows)

Failed

yurydelendik commented Apr 15, 2014

timvandermeij commented Apr 15, 2014

yurydelendik Apr 15, 2014