Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api-minor] Expanding divs to improve selection #7539

Merged
merged 2 commits into from
Sep 1, 2016

Conversation

jeremypress
Copy link

@jeremypress jeremypress commented Aug 12, 2016

Description

Continuing the work by @yurydelendik, I'm attempting to get his commit into shape to be included in the library. So far, I've converted his feature into an option that is set when defining the PDFViewerApplication, and have run some preliminary testing.

Performance Testing

So far I have tested 2 aspects of performance in Chrome. Memory/CPU usage, and the time taken to render text layers and set the text content, as previously logged by Yury. I chose 5 PDFs that vary in size, amount of text, and complexity to try to get a wide base of results. I've attached excel documents of my findings, and I'll summarize them here.

Here are the files I've used for testing so far:
pdfs.zip

CPU and Memory Testing

This was done with a modification of a script provided by Yury that logs CPU and Memory usage every 300 ms. Additionally, I wrote an apple script to simulate scrolling on the browser. First scrolling up, then down, then jumping to the last page and scrolling up. This allows me to replicate and time the results from the script. The scripts can be found here: https://github.com/jeremypress/browser-testing/tree/master. Below are some graphs comparing usage between the two branches. There seems to be a negligible difference in CPU and Memory, but further testing may determine otherwise.
chromeperformance.xlsx

File 1: 5.8mb, 200 pages
blueprint-cpumem

File 2: 4.2mb, 81 pages
webdesigner-cpumem

File 3: 620kb, 3 pages
california-cpumem

Render Times

Data on render times was logged using console.time statements written by Yury. Here is where the performance difference between the two branches can be noted. On text heavy files, render times can be as much as twice a slow, which would correlate to around a half second increase in total rendering time spent. Also, in some cases the fairexpand branch requires a small amount of extra divs to be rendered. However, this does not really impact first impression, as most of the rendering is done as a user moves to a new page. A sample of the data is provided below, and the full file is attached.
render times.xlsx

File 1: 5.8mb, 200 pages
blueprint-render

File 2: 4.2mb, 81 pages
webdesigner-render

File 3: 620kb, 3 pages
california-render

Handling rtl and rotated/vertical text

Overall I've noticed that this enhancement handles rtl selection fine, and generally reduces flickering on rotated text. Although there are some instances of rotated text where fairexpand makes it harder or impossible to select, these are already problem areas on the master branch. Attached are some videos showing the differences, and screenshots of the text divs on a document with vertical text.

Videos: https://cloud.box.com/s/1twc6zhcf1lswo2pe1vqzmrrjb6btxwy

Screenshots:
master branch:
screen shot 2016-08-11 at 1 23 30 pm

screen shot 2016-08-11 at 1 26 10 pm

fairexpand branch:
screen shot 2016-08-11 at 1 24 34 pm

screen shot 2016-08-11 at 1 25 40 pm

# Concerns 1. Chrome is the only browser I've tested in so far, what other data is desired in order to determine if the commit is performant enough? 2. Is the option implemented correctly according to the design of the project? 3. I'm failing a unit test regarding rendering PDF's in parallel. However, this also happens on my master branch, so it could be due to my testing environment.

Feedback is greatly appreciated so I can make improvements going forward.


This change is Reviewable

@yurydelendik
Copy link
Contributor

Overall looks good. Can you squash all commits into one (using git rebase -i command)? Also remove all instrumentation console.time/timeEnd/log operations. You can create addition commit with console instrumentation, which you can remove (via git rebase -i) before landing.

Also in separate commit modify test driver to get the extension setting from the manifest and apply that for the "text" tests and introduce couple of tests with this setting.


Reviewed 6 of 7 files at r2.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


web/app.js, line 364 [r2] (raw file):

        PDFJS.externalLinkTarget = value;
      }),

unneeded change


web/text_layer_builder.js, line 25 [r2] (raw file):

  return !NonWhitespaceRegexp.test(str);
}

We don't need the code above.


web/text_layer_builder.js, line 91 [r2] (raw file):

        this.textLayerRenderTask = null;
      }

Unneeded change


Comments from Reviewable

@jeremypress jeremypress force-pushed the fairexpand branch 4 times, most recently from e0ac122 to 19d651e Compare August 17, 2016 00:18
});
this.textLayerRenderTask.promise.then(function () {
this.textLayerDiv.appendChild(textLayerFrag);
this._finishRendering();
this.updateMatches();
}.bind(this), function (reason) {
// canceled or failed to render text layer -- skipping errors
// canceled or failed to render text layer -- skipping errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: revert this change as everything should be indented with two spaces

@timvandermeij
Copy link
Contributor

Is this disabled by default to first land the code for more testing and enabling it later? If so, when this lands we should remember to file a follow-up issue to enable this mode by default.

@yurydelendik
Copy link
Contributor

yurydelendik commented Aug 17, 2016

TODOs:

  • decide if we want to enable this by default (with //TODO comments not-addressed)
  • remove/disable bindMouse-hack in the text_layer_builder.js depending on options
  • add "text" test(s) for enhanced mode to track regressions for it

Both TODO-comments (red-black tree and reference counting) decrease algorithmic complexity of the feature. However it looks like we are not dealing with large amount of text items on a "horizon" -- large amount of text items must be in single row or column in random order. I don't think it can be found in real world documents (?), so we can keep it as is for release.

@yurydelendik
Copy link
Contributor

Reviewed 1 of 3 files at r3.
Review status: 4 of 6 files reviewed at latest revision, 5 unresolved discussions.


Comments from Reviewable

@@ -468,12 +472,13 @@ var Driver = (function DriverClosure() {
var textLayerContext = textLayerCanvas.getContext('2d');
textLayerContext.clearRect(0, 0,
textLayerCanvas.width, textLayerCanvas.height);
var enhanceText = task.id.includes('enhance') ? true : false;
Copy link
Contributor

@timvandermeij timvandermeij Aug 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

includes returns a boolean, so we can simplify this line to var enhanceText = task.id.includes('enhance');

However, usually we introduce a new key in the objects for this. Take for example b7217a2 where we introduces the annotations key to indicate that the annotation layer is tested. I think we should do the same here: instead of putting -enhance in the name, which is more error-prone if it is accidentally renamed, we should add a new key enhance_text_selection and set it to true.

Copy link
Collaborator

@Snuffleupagus Snuffleupagus Aug 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be much better overall if a enhance flag was added (similar to the annotations flag in PR #6780).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, we had the same thought :)

I would personally prefer enhance_text_selection to make it clear what is being enhanced, much like the already used enhanceTextSelection variable in the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't just enhance be enough though, since the type is already set to text?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of couse, I forgot about that. Yes, enhance would be fine then.

@timvandermeij
Copy link
Contributor

This patch is looking much better already. Please address the comments above and squash the commits into one commit.

@jeremypress jeremypress force-pushed the fairexpand branch 2 times, most recently from 22f8e3b to 295b609 Compare August 17, 2016 22:52
@jeremypress
Copy link
Author

Hey guys, thanks for all the feedback. I think I addressed everything, but let me know if I missed something.

* @private
*/
function TextLayerRenderTask(textContent, container, viewport, textDivs) {
function TextLayerRenderTask(textContent, container, viewport, textDivs,
enhanceTextSelection) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please increase the indentation with one space on this line, so that the parameters line up; i.e.

  function TextLayerRenderTask(textContent, container, viewport, textDivs,
                               enhanceTextSelection) {

@Snuffleupagus
Copy link
Collaborator

[...] decide if we want to enable this by default (with //TODO comments not-addressed)

So that we don't forget, note that this PR still has the problem that was outlined in #6663 (comment):

This does have the side affect of always showing the text cursor no matter where you hover over the page (instead of just over text).

@yurydelendik
Copy link
Contributor

yurydelendik commented Aug 18, 2016

I'm thinking we can expand on mousedown capture phase and return to previous state at mouseup:

  • check if we can split text_layer.js logic into enable/disable expansion.

If not, we might need to keep it off and expose via default_preferences.js

@jeremypress
Copy link
Author

fair expand on click

Here's a snippet of what the div expansion on click looks like. This should solve the cursor problem. I just need to cleanup the code (as well as make the other style fixes above) and turn this into an option so that we can still run the enhancement regression tests.

@jeremypress jeremypress force-pushed the fairexpand branch 2 times, most recently from 242d3eb to b3ce757 Compare August 19, 2016 00:08
@jeremypress
Copy link
Author

@timvandermeij I made the fixes, but left || false in the last two per @Snuffleupagus 's comment. Very excited for this to land!

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2016

From: Bot.io (Linux)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/000cd1fa280eb92/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2016

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/000cd1fa280eb92/output.txt

Total script time: 1.03 mins

Published

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2016

From: Bot.io (Linux)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/22ec8b45bf31792/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2016

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/c0c37f385d52937/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2016

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/c0c37f385d52937/output.txt

Total script time: 24.09 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/c0c37f385d52937/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2016

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/22ec8b45bf31792/output.txt

Total script time: 29.33 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/22ec8b45bf31792/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

/botio makeref

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2016

From: Bot.io (Linux)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/829ccd4ba1ccf88/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2016

From: Bot.io (Windows)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/e3f167bb3810bcf/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2016

From: Bot.io (Windows)


Success

Full output at http://107.22.172.223:8877/e3f167bb3810bcf/output.txt

Total script time: 24.15 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

pdfjsbot commented Sep 1, 2016

From: Bot.io (Linux)


Success

Full output at http://107.21.233.14:8877/829ccd4ba1ccf88/output.txt

Total script time: 28.10 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@timvandermeij timvandermeij merged commit 6bb95e3 into mozilla:master Sep 1, 2016
@timvandermeij
Copy link
Contributor

Nice work! Thank you @yurydelendik for the initial patch and @jeremypress for finalizing the patch!

timvandermeij added a commit to timvandermeij/pdf.js that referenced this pull request Sep 1, 2016
timvandermeij added a commit that referenced this pull request Sep 1, 2016
@speedplane
Copy link

Definitely awesome work everyone! Can we get a sample build made so we can easily play around with it?

@tonyjin
Copy link

tonyjin commented Sep 1, 2016

!!!! this is the best

@timvandermeij
Copy link
Contributor

For now you have to set enhanceTextSelection to true to work with this mode. Now we have PR #7586 that adds it as a viewer preference so it can be toggled more easily.

andrenarchy added a commit to paperhive/frontend that referenced this pull request Sep 2, 2016
make use of the newly introduced text div expansion, see
mozilla/pdf.js#7539
darshitvvora pushed a commit to darshitvvora/pdf.js that referenced this pull request Sep 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants