[api-minor] Expanding divs to improve selection #7539

jeremypress · 2016-08-12T18:56:12Z

Description

Continuing the work by @yurydelendik, I'm attempting to get his commit into shape to be included in the library. So far, I've converted his feature into an option that is set when defining the PDFViewerApplication, and have run some preliminary testing.

Performance Testing

So far I have tested 2 aspects of performance in Chrome. Memory/CPU usage, and the time taken to render text layers and set the text content, as previously logged by Yury. I chose 5 PDFs that vary in size, amount of text, and complexity to try to get a wide base of results. I've attached excel documents of my findings, and I'll summarize them here.

Here are the files I've used for testing so far:
pdfs.zip

CPU and Memory Testing

This was done with a modification of a script provided by Yury that logs CPU and Memory usage every 300 ms. Additionally, I wrote an apple script to simulate scrolling on the browser. First scrolling up, then down, then jumping to the last page and scrolling up. This allows me to replicate and time the results from the script. The scripts can be found here: https://github.com/jeremypress/browser-testing/tree/master. Below are some graphs comparing usage between the two branches. There seems to be a negligible difference in CPU and Memory, but further testing may determine otherwise.
chromeperformance.xlsx

File 1: 5.8mb, 200 pages

File 2: 4.2mb, 81 pages

File 3: 620kb, 3 pages

Render Times

Data on render times was logged using console.time statements written by Yury. Here is where the performance difference between the two branches can be noted. On text heavy files, render times can be as much as twice a slow, which would correlate to around a half second increase in total rendering time spent. Also, in some cases the fairexpand branch requires a small amount of extra divs to be rendered. However, this does not really impact first impression, as most of the rendering is done as a user moves to a new page. A sample of the data is provided below, and the full file is attached.
render times.xlsx

File 1: 5.8mb, 200 pages

File 2: 4.2mb, 81 pages

File 3: 620kb, 3 pages

Handling rtl and rotated/vertical text

Overall I've noticed that this enhancement handles rtl selection fine, and generally reduces flickering on rotated text. Although there are some instances of rotated text where fairexpand makes it harder or impossible to select, these are already problem areas on the master branch. Attached are some videos showing the differences, and screenshots of the text divs on a document with vertical text.

Videos: https://cloud.box.com/s/1twc6zhcf1lswo2pe1vqzmrrjb6btxwy

Screenshots:
master branch:

fairexpand branch:

# Concerns 1. Chrome is the only browser I've tested in so far, what other data is desired in order to determine if the commit is performant enough? 2. Is the option implemented correctly according to the design of the project? 3. I'm failing a unit test regarding rendering PDF's in parallel. However, this also happens on my master branch, so it could be due to my testing environment.

Feedback is greatly appreciated so I can make improvements going forward.

This change is

yurydelendik · 2016-08-16T20:00:08Z

Overall looks good. Can you squash all commits into one (using git rebase -i command)? Also remove all instrumentation console.time/timeEnd/log operations. You can create addition commit with console instrumentation, which you can remove (via git rebase -i) before landing.

Also in separate commit modify test driver to get the extension setting from the manifest and apply that for the "text" tests and introduce couple of tests with this setting.

Reviewed 6 of 7 files at r2.
Review status: all files reviewed at latest revision, 3 unresolved discussions.

web/app.js, line 364 [r2] (raw file):

        PDFJS.externalLinkTarget = value;
      }),

unneeded change

web/text_layer_builder.js, line 25 [r2] (raw file):

  return !NonWhitespaceRegexp.test(str);
}

We don't need the code above.

web/text_layer_builder.js, line 91 [r2] (raw file):

        this.textLayerRenderTask = null;
      }

Unneeded change

Comments from Reviewable

timvandermeij · 2016-08-17T11:55:18Z

web/text_layer_builder.js

      });
      this.textLayerRenderTask.promise.then(function () {
        this.textLayerDiv.appendChild(textLayerFrag);
        this._finishRendering();
        this.updateMatches();
      }.bind(this), function (reason) {
-        // canceled or failed to render text layer -- skipping errors
+          // canceled or failed to render text layer -- skipping errors


Nit: revert this change as everything should be indented with two spaces

timvandermeij · 2016-08-17T12:01:20Z

Is this disabled by default to first land the code for more testing and enabling it later? If so, when this lands we should remember to file a follow-up issue to enable this mode by default.

yurydelendik · 2016-08-17T14:31:53Z

TODOs:

decide if we want to enable this by default (with //TODO comments not-addressed)
remove/disable bindMouse-hack in the text_layer_builder.js depending on options
add "text" test(s) for enhanced mode to track regressions for it

Both TODO-comments (red-black tree and reference counting) decrease algorithmic complexity of the feature. However it looks like we are not dealing with large amount of text items on a "horizon" -- large amount of text items must be in single row or column in random order. I don't think it can be found in real world documents (?), so we can keep it as is for release.

yurydelendik · 2016-08-17T14:33:53Z

Reviewed 1 of 3 files at r3.
Review status: 4 of 6 files reviewed at latest revision, 5 unresolved discussions.

Comments from Reviewable

timvandermeij · 2016-08-17T19:03:44Z

test/driver.js

@@ -468,12 +472,13 @@ var Driver = (function DriverClosure() {
              var textLayerContext = textLayerCanvas.getContext('2d');
              textLayerContext.clearRect(0, 0,
                textLayerCanvas.width, textLayerCanvas.height);
+              var enhanceText = task.id.includes('enhance') ? true : false;


includes returns a boolean, so we can simplify this line to var enhanceText = task.id.includes('enhance');

However, usually we introduce a new key in the objects for this. Take for example b7217a2 where we introduces the annotations key to indicate that the annotation layer is tested. I think we should do the same here: instead of putting -enhance in the name, which is more error-prone if it is accidentally renamed, we should add a new key enhance_text_selection and set it to true.

It would probably be much better overall if a enhance flag was added (similar to the annotations flag in PR #6780).

Good, we had the same thought :)

I would personally prefer enhance_text_selection to make it clear what is being enhanced, much like the already used enhanceTextSelection variable in the code.

Wouldn't just enhance be enough though, since the type is already set to text?

Of couse, I forgot about that. Yes, enhance would be fine then.

timvandermeij · 2016-08-17T19:20:07Z

This patch is looking much better already. Please address the comments above and squash the commits into one commit.

jeremypress · 2016-08-17T22:54:48Z

Hey guys, thanks for all the feedback. I think I addressed everything, but let me know if I missed something.

Snuffleupagus · 2016-08-18T06:05:09Z

src/display/text_layer.js

   * @private
   */
-  function TextLayerRenderTask(textContent, container, viewport, textDivs) {
+  function TextLayerRenderTask(textContent, container, viewport, textDivs,
+                              enhanceTextSelection) {


Nit: please increase the indentation with one space on this line, so that the parameters line up; i.e.

function TextLayerRenderTask(textContent, container, viewport, textDivs, enhanceTextSelection) {

Snuffleupagus · 2016-08-18T06:18:16Z

[...] decide if we want to enable this by default (with //TODO comments not-addressed)

So that we don't forget, note that this PR still has the problem that was outlined in #6663 (comment):

This does have the side affect of always showing the text cursor no matter where you hover over the page (instead of just over text).

yurydelendik · 2016-08-18T13:17:52Z

I'm thinking we can expand on mousedown capture phase and return to previous state at mouseup:

check if we can split text_layer.js logic into enable/disable expansion.

If not, we might need to keep it off and expose via default_preferences.js

jeremypress · 2016-08-18T20:45:32Z

Here's a snippet of what the div expansion on click looks like. This should solve the cursor problem. I just need to cleanup the code (as well as make the other style fixes above) and turn this into an option so that we can still run the enhancement regression tests.

jeremypress · 2016-08-31T17:03:19Z

@timvandermeij I made the fixes, but left || false in the last two per @Snuffleupagus 's comment. Very excited for this to land!

timvandermeij · 2016-09-01T11:08:22Z

/botio-linux preview

pdfjsbot · 2016-09-01T11:08:23Z

From: Bot.io (Linux)

Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/000cd1fa280eb92/output.txt

pdfjsbot · 2016-09-01T11:09:25Z

From: Bot.io (Linux)

Success

Full output at http://107.21.233.14:8877/000cd1fa280eb92/output.txt

Total script time: 1.03 mins

Published

timvandermeij · 2016-09-01T11:09:32Z

/botio test

pdfjsbot · 2016-09-01T11:09:32Z

From: Bot.io (Linux)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/22ec8b45bf31792/output.txt

pdfjsbot · 2016-09-01T11:09:32Z

From: Bot.io (Windows)

Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/c0c37f385d52937/output.txt

pdfjsbot · 2016-09-01T11:33:38Z

From: Bot.io (Windows)

Failed

Full output at http://107.22.172.223:8877/c0c37f385d52937/output.txt

Total script time: 24.09 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/c0c37f385d52937/reftest-analyzer.html#web=eq.log

pdfjsbot · 2016-09-01T11:38:53Z

From: Bot.io (Linux)

Failed

Full output at http://107.21.233.14:8877/22ec8b45bf31792/output.txt

Total script time: 29.33 mins

Font tests: Passed
Unit tests: Passed
Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/22ec8b45bf31792/reftest-analyzer.html#web=eq.log

timvandermeij · 2016-09-01T14:37:57Z

/botio makeref

pdfjsbot · 2016-09-01T14:37:58Z

From: Bot.io (Linux)

Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/829ccd4ba1ccf88/output.txt

pdfjsbot · 2016-09-01T14:37:58Z

From: Bot.io (Windows)

Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/e3f167bb3810bcf/output.txt

pdfjsbot · 2016-09-01T15:02:07Z

From: Bot.io (Windows)

Success

Full output at http://107.22.172.223:8877/e3f167bb3810bcf/output.txt

Total script time: 24.15 mins

Lint: Passed
Make references: Passed
Check references: Passed

pdfjsbot · 2016-09-01T15:06:04Z

From: Bot.io (Linux)

Success

Full output at http://107.21.233.14:8877/829ccd4ba1ccf88/output.txt

Total script time: 28.10 mins

Lint: Passed
Make references: Passed
Check references: Passed

timvandermeij · 2016-09-01T15:44:25Z

Nice work! Thank you @yurydelendik for the initial patch and @jeremypress for finalizing the patch!

Minor code style improvements after #7539

speedplane · 2016-09-01T21:13:11Z

Definitely awesome work everyone! Can we get a sample build made so we can easily play around with it?

tonyjin · 2016-09-01T21:48:24Z

!!!! this is the best

timvandermeij · 2016-09-02T12:21:29Z

For now you have to set enhanceTextSelection to true to work with this mode. Now we have PR #7586 that adds it as a viewer preference so it can be toggled more easily.

make use of the newly introduced text div expansion, see mozilla/pdf.js#7539

timvandermeij added the text-selection label Aug 13, 2016

timvandermeij assigned yurydelendik Aug 13, 2016

jeremypress force-pushed the fairexpand branch 4 times, most recently from e0ac122 to 19d651e Compare August 17, 2016 00:18

timvandermeij reviewed Aug 17, 2016
View reviewed changes

jeremypress force-pushed the fairexpand branch from 19d651e to 2687e85 Compare August 17, 2016 17:00

timvandermeij reviewed Aug 17, 2016
View reviewed changes

jeremypress force-pushed the fairexpand branch 2 times, most recently from 22f8e3b to 295b609 Compare August 17, 2016 22:52

Snuffleupagus reviewed Aug 18, 2016
View reviewed changes

jeremypress force-pushed the fairexpand branch 2 times, most recently from 242d3eb to b3ce757 Compare August 19, 2016 00:08

added text enhancement regression tests

1ceeb4d

jeremypress force-pushed the fairexpand branch from eb53bd3 to 1ceeb4d Compare August 31, 2016 16:55

This was referenced Sep 1, 2016

Improved Text Selection By Padding Text Elements #5545

Closed

Improving the way how text selection works #4843

Closed

timvandermeij merged commit 6bb95e3 into mozilla:master Sep 1, 2016

timvandermeij mentioned this pull request Sep 1, 2016

Enable enhanced text selection by default #7584

Closed

5 tasks

timvandermeij added a commit to timvandermeij/pdf.js that referenced this pull request Sep 1, 2016

Minor code style improvements after mozilla#7539

7c961b6

timvandermeij added a commit that referenced this pull request Sep 1, 2016

Merge pull request #7585 from timvandermeij/text-selection-code-fixes

8522625

Minor code style improvements after #7539

andrenarchy added a commit to paperhive/frontend that referenced this pull request Sep 2, 2016

use enhancedTextSelection of pdfjs

5469f59

make use of the newly introduced text div expansion, see mozilla/pdf.js#7539

andrenarchy mentioned this pull request Sep 2, 2016

use enhancedTextSelection of pdfjs paperhive/frontend#534

Merged

darshitvvora pushed a commit to darshitvvora/pdf.js that referenced this pull request Sep 13, 2016

Minor code style improvements after mozilla#7539

80fbfbe

hdensity mentioned this pull request Mar 16, 2021

pdfDefaultOptions.textLayerMode is ignored/overridden stephanrauh/ngx-extended-pdf-viewer#694

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[api-minor] Expanding divs to improve selection #7539

[api-minor] Expanding divs to improve selection #7539

jeremypress commented Aug 12, 2016 •

edited by yurydelendik

Loading

yurydelendik commented Aug 16, 2016

timvandermeij Aug 17, 2016

timvandermeij commented Aug 17, 2016

yurydelendik commented Aug 17, 2016 •

edited

Loading

yurydelendik commented Aug 17, 2016

timvandermeij Aug 17, 2016 •

edited

Loading

Snuffleupagus Aug 17, 2016 •

edited

Loading

timvandermeij Aug 17, 2016

Snuffleupagus Aug 17, 2016

timvandermeij Aug 17, 2016

timvandermeij commented Aug 17, 2016

jeremypress commented Aug 17, 2016

Snuffleupagus Aug 18, 2016

Snuffleupagus commented Aug 18, 2016

yurydelendik commented Aug 18, 2016 •

edited

Loading

jeremypress commented Aug 18, 2016

jeremypress commented Aug 31, 2016

timvandermeij commented Sep 1, 2016

pdfjsbot commented Sep 1, 2016

pdfjsbot commented Sep 1, 2016

timvandermeij commented Sep 1, 2016

pdfjsbot commented Sep 1, 2016

pdfjsbot commented Sep 1, 2016

pdfjsbot commented Sep 1, 2016

pdfjsbot commented Sep 1, 2016

timvandermeij commented Sep 1, 2016

pdfjsbot commented Sep 1, 2016

pdfjsbot commented Sep 1, 2016

pdfjsbot commented Sep 1, 2016

pdfjsbot commented Sep 1, 2016

timvandermeij commented Sep 1, 2016

speedplane commented Sep 1, 2016

tonyjin commented Sep 1, 2016

timvandermeij commented Sep 2, 2016

[api-minor] Expanding divs to improve selection #7539

[api-minor] Expanding divs to improve selection #7539

Conversation

jeremypress commented Aug 12, 2016 • edited by yurydelendik Loading

Description

Performance Testing

CPU and Memory Testing

Render Times

Handling rtl and rotated/vertical text

yurydelendik commented Aug 16, 2016

timvandermeij Aug 17, 2016

Choose a reason for hiding this comment

timvandermeij commented Aug 17, 2016

yurydelendik commented Aug 17, 2016 • edited Loading

yurydelendik commented Aug 17, 2016

timvandermeij Aug 17, 2016 • edited Loading

Choose a reason for hiding this comment

Snuffleupagus Aug 17, 2016 • edited Loading

Choose a reason for hiding this comment

timvandermeij Aug 17, 2016

Choose a reason for hiding this comment

Snuffleupagus Aug 17, 2016

Choose a reason for hiding this comment

timvandermeij Aug 17, 2016

Choose a reason for hiding this comment

timvandermeij commented Aug 17, 2016

jeremypress commented Aug 17, 2016

Snuffleupagus Aug 18, 2016

Choose a reason for hiding this comment

Snuffleupagus commented Aug 18, 2016

yurydelendik commented Aug 18, 2016 • edited Loading

jeremypress commented Aug 18, 2016

jeremypress commented Aug 31, 2016

timvandermeij commented Sep 1, 2016

pdfjsbot commented Sep 1, 2016

From: Bot.io (Linux)

Received

pdfjsbot commented Sep 1, 2016

From: Bot.io (Linux)

Success

Published

timvandermeij commented Sep 1, 2016

pdfjsbot commented Sep 1, 2016

From: Bot.io (Linux)

Received

pdfjsbot commented Sep 1, 2016

From: Bot.io (Windows)

Received

pdfjsbot commented Sep 1, 2016

From: Bot.io (Windows)

Failed

pdfjsbot commented Sep 1, 2016

From: Bot.io (Linux)

Failed

timvandermeij commented Sep 1, 2016

pdfjsbot commented Sep 1, 2016

From: Bot.io (Linux)

Received

pdfjsbot commented Sep 1, 2016

From: Bot.io (Windows)

Received

pdfjsbot commented Sep 1, 2016

From: Bot.io (Windows)

Success

pdfjsbot commented Sep 1, 2016

From: Bot.io (Linux)

Success

timvandermeij commented Sep 1, 2016

speedplane commented Sep 1, 2016

tonyjin commented Sep 1, 2016

timvandermeij commented Sep 2, 2016

jeremypress commented Aug 12, 2016 •

edited by yurydelendik

Loading

yurydelendik commented Aug 17, 2016 •

edited

Loading

timvandermeij Aug 17, 2016 •

edited

Loading

Snuffleupagus Aug 17, 2016 •

edited

Loading

yurydelendik commented Aug 18, 2016 •

edited

Loading