-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api-minor] Expanding divs to improve selection #7539
Conversation
Overall looks good. Can you squash all commits into one (using Also in separate commit modify test driver to get the extension setting from the manifest and apply that for the "text" tests and introduce couple of tests with this setting. Reviewed 6 of 7 files at r2. web/app.js, line 364 [r2] (raw file):
unneeded change web/text_layer_builder.js, line 25 [r2] (raw file):
We don't need the code above. web/text_layer_builder.js, line 91 [r2] (raw file):
Unneeded change Comments from Reviewable |
e0ac122
to
19d651e
Compare
}); | ||
this.textLayerRenderTask.promise.then(function () { | ||
this.textLayerDiv.appendChild(textLayerFrag); | ||
this._finishRendering(); | ||
this.updateMatches(); | ||
}.bind(this), function (reason) { | ||
// canceled or failed to render text layer -- skipping errors | ||
// canceled or failed to render text layer -- skipping errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: revert this change as everything should be indented with two spaces
Is this disabled by default to first land the code for more testing and enabling it later? If so, when this lands we should remember to file a follow-up issue to enable this mode by default. |
TODOs:
Both TODO-comments (red-black tree and reference counting) decrease algorithmic complexity of the feature. However it looks like we are not dealing with large amount of text items on a "horizon" -- large amount of text items must be in single row or column in random order. I don't think it can be found in real world documents (?), so we can keep it as is for release. |
Reviewed 1 of 3 files at r3. Comments from Reviewable |
19d651e
to
2687e85
Compare
@@ -468,12 +472,13 @@ var Driver = (function DriverClosure() { | |||
var textLayerContext = textLayerCanvas.getContext('2d'); | |||
textLayerContext.clearRect(0, 0, | |||
textLayerCanvas.width, textLayerCanvas.height); | |||
var enhanceText = task.id.includes('enhance') ? true : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
includes
returns a boolean, so we can simplify this line to var enhanceText = task.id.includes('enhance');
However, usually we introduce a new key in the objects for this. Take for example b7217a2 where we introduces the annotations
key to indicate that the annotation layer is tested. I think we should do the same here: instead of putting -enhance
in the name, which is more error-prone if it is accidentally renamed, we should add a new key enhance_text_selection
and set it to true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be much better overall if a enhance
flag was added (similar to the annotations
flag in PR #6780).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, we had the same thought :)
I would personally prefer enhance_text_selection
to make it clear what is being enhanced, much like the already used enhanceTextSelection
variable in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't just enhance
be enough though, since the type is already set to text
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of couse, I forgot about that. Yes, enhance
would be fine then.
This patch is looking much better already. Please address the comments above and squash the commits into one commit. |
22f8e3b
to
295b609
Compare
Hey guys, thanks for all the feedback. I think I addressed everything, but let me know if I missed something. |
* @private | ||
*/ | ||
function TextLayerRenderTask(textContent, container, viewport, textDivs) { | ||
function TextLayerRenderTask(textContent, container, viewport, textDivs, | ||
enhanceTextSelection) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please increase the indentation with one space on this line, so that the parameters line up; i.e.
function TextLayerRenderTask(textContent, container, viewport, textDivs,
enhanceTextSelection) {
So that we don't forget, note that this PR still has the problem that was outlined in #6663 (comment):
|
I'm thinking we can expand on mousedown capture phase and return to previous state at mouseup:
If not, we might need to keep it off and expose via default_preferences.js |
242d3eb
to
b3ce757
Compare
eb53bd3
to
1ceeb4d
Compare
@timvandermeij I made the fixes, but left |
/botio-linux preview |
From: Bot.io (Linux)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/000cd1fa280eb92/output.txt |
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/000cd1fa280eb92/output.txt Total script time: 1.03 mins Published |
/botio test |
From: Bot.io (Linux)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/22ec8b45bf31792/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://107.22.172.223:8877/c0c37f385d52937/output.txt |
From: Bot.io (Windows)FailedFull output at http://107.22.172.223:8877/c0c37f385d52937/output.txt Total script time: 24.09 mins
Image differences available at: http://107.22.172.223:8877/c0c37f385d52937/reftest-analyzer.html#web=eq.log |
From: Bot.io (Linux)FailedFull output at http://107.21.233.14:8877/22ec8b45bf31792/output.txt Total script time: 29.33 mins
Image differences available at: http://107.21.233.14:8877/22ec8b45bf31792/reftest-analyzer.html#web=eq.log |
/botio makeref |
From: Bot.io (Linux)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/829ccd4ba1ccf88/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://107.22.172.223:8877/e3f167bb3810bcf/output.txt |
From: Bot.io (Windows)SuccessFull output at http://107.22.172.223:8877/e3f167bb3810bcf/output.txt Total script time: 24.15 mins
|
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/829ccd4ba1ccf88/output.txt Total script time: 28.10 mins
|
Nice work! Thank you @yurydelendik for the initial patch and @jeremypress for finalizing the patch! |
Minor code style improvements after #7539
Definitely awesome work everyone! Can we get a sample build made so we can easily play around with it? |
!!!! this is the best |
For now you have to set |
make use of the newly introduced text div expansion, see mozilla/pdf.js#7539
Description
Continuing the work by @yurydelendik, I'm attempting to get his commit into shape to be included in the library. So far, I've converted his feature into an option that is set when defining the PDFViewerApplication, and have run some preliminary testing.
Performance Testing
So far I have tested 2 aspects of performance in Chrome. Memory/CPU usage, and the time taken to render text layers and set the text content, as previously logged by Yury. I chose 5 PDFs that vary in size, amount of text, and complexity to try to get a wide base of results. I've attached excel documents of my findings, and I'll summarize them here.
Here are the files I've used for testing so far:
pdfs.zip
CPU and Memory Testing
This was done with a modification of a script provided by Yury that logs CPU and Memory usage every 300 ms. Additionally, I wrote an apple script to simulate scrolling on the browser. First scrolling up, then down, then jumping to the last page and scrolling up. This allows me to replicate and time the results from the script. The scripts can be found here: https://github.com/jeremypress/browser-testing/tree/master. Below are some graphs comparing usage between the two branches. There seems to be a negligible difference in CPU and Memory, but further testing may determine otherwise.
chromeperformance.xlsx
File 1: 5.8mb, 200 pages
File 2: 4.2mb, 81 pages
File 3: 620kb, 3 pages
Render Times
Data on render times was logged using console.time statements written by Yury. Here is where the performance difference between the two branches can be noted. On text heavy files, render times can be as much as twice a slow, which would correlate to around a half second increase in total rendering time spent. Also, in some cases the fairexpand branch requires a small amount of extra divs to be rendered. However, this does not really impact first impression, as most of the rendering is done as a user moves to a new page. A sample of the data is provided below, and the full file is attached.
render times.xlsx
File 1: 5.8mb, 200 pages
File 2: 4.2mb, 81 pages
File 3: 620kb, 3 pages
Handling rtl and rotated/vertical text
Overall I've noticed that this enhancement handles rtl selection fine, and generally reduces flickering on rotated text. Although there are some instances of rotated text where fairexpand makes it harder or impossible to select, these are already problem areas on the master branch. Attached are some videos showing the differences, and screenshots of the text divs on a document with vertical text.
Videos: https://cloud.box.com/s/1twc6zhcf1lswo2pe1vqzmrrjb6btxwy
Screenshots:
master branch:
fairexpand branch:
# Concerns 1. Chrome is the only browser I've tested in so far, what other data is desired in order to determine if the commit is performant enough? 2. Is the option implemented correctly according to the design of the project? 3. I'm failing a unit test regarding rendering PDF's in parallel. However, this also happens on my master branch, so it could be due to my testing environment.
Feedback is greatly appreciated so I can make improvements going forward.
This change is