Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less copying in the JPX decoder #4538

Merged
merged 1 commit into from
Apr 16, 2014

Conversation

fkaelberer
Copy link
Contributor

The goal of this PR is to reduce the copy operations during JPX decoding.

In the existing code, JPX image are decoded to individual UInt8Arrays for each tile and each color channel in JPX.js, and then joined to a single (rgb) data array in stream.js. This PR now creates a single rgb array for each tile, so merging those in streams.js is simplified, and can be even avoided if only one tile is present (which is the case for example in #2790 and #3181, so it should save time and memory).

I also refactored the shifting and clamping in the JPX code (so this PR supersedes #4531).
Before, there where two loops: one loop for shifting in the unsigned case (L1065), and one loop that shifted in the signed case and clamped to UInt8 (L1080).
I avoided those loops and instead used a shift-and-clamp function when the data is written in the first place.

This patch will not pass the equality test of S2.pdf, because there is a slightly different rounding behavior.
You can make it pass the test by changing line 1024 from
return (value <= -128) ? 0 : ((value >= 127) ? 255 : (value + 128) | 0);
return (value <= -128) ? 0 : ((value >= 127) ? 255 : new Float32Array([value + 128])[0] | 0);
The s2.pdf example contains values that are so close to an integer that truncating to 32 bit precision does make a difference, e.g. (new Float32Array([119.99999237060547 + 128])[0] | 0) != ((119.99999237060547 + 128) | 0).
In the old code, the truncation to 32 bit precision happened in line 1075, and so 119.99..+128 was rounded up, whereas now it is rounded down.

When I tried to figure out which of the two possibilities is the correct behavior, I noticed that the jj2000 java reference code produces neither of them: the floating point values differ up to +-4 (in the [-128, 128] range), so if a handful of 119.99-pixels is rounded up or down shouldn't matter here, given that all the other pixels might be off by up to 4. (EDIT: I opened issue #4540 for that).

@fkaelberer
Copy link
Contributor Author

Sorry, I forgot to lint. Is my use of the switch statement discouraged?

@Snuffleupagus
Copy link
Collaborator

Is my use of the switch statement discouraged?

Just add /* falls through */ instead of break;, and you should be good to go!

@fkaelberer
Copy link
Contributor Author

Ok, thanks.

@p01
Copy link
Contributor

p01 commented Mar 31, 2014

I started to look at jpx.js too.
I agree, the interleaving of the tileComponents does not belong in stream.js.

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/b0e205621fc30c5/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/b7d2be751042d18/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/b7d2be751042d18/output.txt

Total script time: 2.51 mins

  • Font tests: FAILED
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/b7d2be751042d18/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

/botio-windows test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/346b31481483d1a/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/b0e205621fc30c5/output.txt

Total script time: 25.63 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/b0e205621fc30c5/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/346b31481483d1a/output.txt

Total script time: 21.63 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/346b31481483d1a/reftest-analyzer.html#web=eq.log

@fkaelberer
Copy link
Contributor Author

I added a fix for #4540.
In the example of #4540, the output of copyCoefficients() now matches the jj2000 coefficients up to some reasonable float precision.
Pixels still differ from Irfanview output, but never more than 1/255.

@Snuffleupagus
Copy link
Collaborator

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Apr 1, 2014

From: Bot.io (Linux)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/c5c61d2a3d64f4d/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 1, 2014

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/6a5aa1af7e30013/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 1, 2014

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/6a5aa1af7e30013/output.txt

Total script time: 2.51 mins

  • Font tests: FAILED
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/6a5aa1af7e30013/reftest-analyzer.html#web=eq.log

@yurydelendik
Copy link
Contributor

/botio-windows test

@pdfjsbot
Copy link

pdfjsbot commented Apr 1, 2014

From: Bot.io (Windows)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/de08d2879653fe8/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 1, 2014

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/c5c61d2a3d64f4d/output.txt

Total script time: 25.44 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/c5c61d2a3d64f4d/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Apr 1, 2014

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/de08d2879653fe8/output.txt

Total script time: 22.25 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/de08d2879653fe8/reftest-analyzer.html#web=eq.log

@fkaelberer
Copy link
Contributor Author

Hmm... this patch is supposed to reduce mem consumption. However, I measured a significant increase of > 100MB. Does anybody have an idea why?

Difference of the snapshots:

147.78 MB (100.0%) -- explicit
├──145.16 MB (98.23%) -- workers
│  ├──145.16 MB (98.23%) -- workers(pdf.js)/worker(../build/pdf.worker.js, 0xNNN)
│  │  ├──210.01 MB (142.11%) -- zone(0x7fafa5608000)
│  │  │  ├──208.47 MB (141.07%) -- compartment(web-worker)
│  │  │  │  ├──206.25 MB (139.56%) -- objects
│  │  │  │  │  ├──205.59 MB (139.12%) -- malloc-heap
│  │  │  │  │  │  ├──205.41 MB (138.99%) ── elements/non-asm.js
│  │  │  │  │  │  └────0.18 MB (00.12%) ── slots [+]
│  │  │  │  │  └────0.66 MB (00.44%) ++ gc-heap
│  │  │  │  ├────1.76 MB (01.19%) ++ shapes
│  │  │  │  └────0.47 MB (00.32%) ++ (3 tiny)
│  │  │  └────1.54 MB (01.04%) ++ (8 tiny)
│  │  └──-64.85 MB (-43.88%) ++ (7 tiny)
│  └────0.00 MB (00.00%) ++ workers()
└────2.62 MB (01.77%) ++ (9 tiny)

EDIT: the measurement was made with the first page of #2790, which has only one tile (so that the copy process in stream.js is skipped), and uses multi-component transform.

@fkaelberer
Copy link
Contributor Author

Let me be more precise on the measurements:
For the above mem snapshots I used aurora28.0a2 (2014-01-13).
The current nightly behaves completely different.

This time, I created a fresh profile and I got the following peak RSS using memusg. Each measurement was repeated 5-8 times. In both cases, the new patch delivered much more consistent results.

browser master (0.8.1334) lessCopyingInJPX, rebased
aurora28.0a2 (2014-01-13) 335 - 367MB 456 - 460MB
nightly31.0a1 (2014-04-01 423 - 499MB 469 - 475MB

Environment: Linux Mint Ubuntu-64Bit in a Windows VM.

@fkaelberer
Copy link
Contributor Author

Hmmm... I continued to reduce memory allocations, and the peak memory keeps going up.
The last commit avoids the creation of a large arrays, but the memory of nightly keeps peaking at 460MB-490MB.

When the jpeg2000 algorithm decodes the image from coarse to fine, it used to first create
four coefficient arrays [a1, a2, a3, a4], ..., [d1, d2, d3, d4], which are then interleaved to [a1, b1, c1, d1, a2, ..., d4] in Transform.iterate(). This patch skips the creation of the four arrays and writes the coefficients directly to the interleaved array.
I thought this would matter, since the arrays are quite big: Without the patch, the following arrays hat to be stored in the final image composition phase: the coefficient arrays (1 Float32/pixel), interleaved coefficient array (1 Float32/pixel and component), output arrays (1 Byte/pixel and component). That's (4 + 5 * #comps Bytes per pixel, ie, 228 MBytes for my 12Mpixel test case. This patch gets rid of the "4" in the formula). The amount of total Float allocations is reduced from 330MB to 190MB).

While the numbers sound good in theory, I wonder why they don't help in practice. Maybe I should try another test file :-].

@fkaelberer
Copy link
Contributor Author

Blocked by #4565.

@fkaelberer
Copy link
Contributor Author

Waiting for #4568 to land.

@p01
Copy link
Contributor

p01 commented Apr 10, 2014

Good to go! #4568 is in.

@timvandermeij
Copy link
Contributor

@fkaelberer Could you rebase this PR?

@fkaelberer
Copy link
Contributor Author

It is now rebased and merged with p01's changes.

I changed the code for shifting and clamping once more from
data[j] = val <= low ? 0 : val >= high ? 255 : (val >> shift) + 128 to
data[j] = val <= low ? 0 : val >= high ? 255 : (val + offset) >> shift
due to accuracy issues.

val is the result from the component transform and has +0.5 built-in, and it is supposed to be rounded down. However, val >> shift rounds val up (towards 0), if val is negative.
By adding the offset before the shift, val + offset is always rounded down.
This corrects the colors in this example, which has background color #10101.

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/4200d9dc57a21f9/output.txt

Total script time: 19.89 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/4200d9dc57a21f9/reftest-analyzer.html#web=eq.log

@fkaelberer
Copy link
Contributor Author

Any ideas what's wrong?

Btw, the memory performance inconsistencies suddenly resolved themselves after a little refactoring. Peak RSS memory is now down from ~355MB to ~333MB using goodytwoshoes00newyiala.pdf, and performance is also better now (Maybe for the same reason as in #3554?).

(I could not test with the pdf of #2790 again, because the current Firefox nightly 2014-04-14 crashes when opening that pdf, with or without this patch.)

@fkaelberer
Copy link
Contributor Author

Filed an upstream bug, https://bugzilla.mozilla.org/show_bug.cgi?id=996537

@Snuffleupagus
Copy link
Collaborator

Any ideas what's wrong?

Since it's a timeout, probably just some intermittent issue. I'll re-trigger the test.

(I could not test with the pdf of #2790 again, because the current Firefox nightly 2014-04-14 crashes when opening that pdf, with or without this patch.)

For me, the crash report indicates that this is https://bugzilla.mozilla.org/show_bug.cgi?id=995982.

/botio-windows test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/fe6d6e7806b31dc/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/fe6d6e7806b31dc/output.txt

Total script time: 19.74 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/fe6d6e7806b31dc/reftest-analyzer.html#web=eq.log

@Snuffleupagus
Copy link
Collaborator

Hmm, the Firefox tests still time-outs. Unfortunately I have no idea why.

@yurydelendik
Copy link
Contributor

/botio-windows test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/dcbf24f74d0fa37/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/dcbf24f74d0fa37/output.txt

Total script time: 22.93 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/dcbf24f74d0fa37/reftest-analyzer.html#web=eq.log

@yurydelendik
Copy link
Contributor

This time it was fine, probably intermittent, reviewing...

@timvandermeij
Copy link
Contributor

@fkaelberer Also needs a rebase again.

var height = subband.tby1 - subband.tby0;
var codeblocks = subband.codeblocks;
var right = subband.type == 'HL' || subband.type == 'HH' ? 1 : 0;
var bottom = subband.type == 'LH' || subband.type == 'HH' ? levelWidth : 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use === here?

@yurydelendik
Copy link
Contributor

Logic looks good. Good to go with rebase and nits above addressed.

@yurydelendik
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.21.233.14:8877/8d35cf2a1702c97/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/6be4c9700e4b9f0/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/6be4c9700e4b9f0/output.txt

Total script time: 22.33 mins

  • Font tests: FAILED
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/6be4c9700e4b9f0/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/8d35cf2a1702c97/output.txt

Total script time: 25.95 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/8d35cf2a1702c97/reftest-analyzer.html#web=eq.log

@yurydelendik
Copy link
Contributor

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/32c9240530c5d54/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_makeref from @yurydelendik received. Current queue size: 0

Live output at: http://107.21.233.14:8877/e2126d58bcd9b07/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/32c9240530c5d54/output.txt

Total script time: 23.58 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/e2126d58bcd9b07/output.txt

Total script time: 25.91 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

yurydelendik added a commit that referenced this pull request Apr 16, 2014
@yurydelendik yurydelendik merged commit dea4bda into mozilla:master Apr 16, 2014
@yurydelendik
Copy link
Contributor

Thank you for the patch.

@fkaelberer fkaelberer deleted the lessCopyingInJPX branch April 16, 2014 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants