-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow performance: Chinese PDF #4580
Comments
There are also many console warnings: "Warning: Could not find a preferred cmap table." pdf.worker.js:200 "Warning: Unsupported feature "font"" pdf.worker.js:200 "Warning: Unsupported feature "font"" pdf.js:200 "Warning: Error during font loading: cmapMappings is undefined" pdf.js:200 |
FYI: ~50% of the time is spent for the text layer, and the rest is bottlenecked by EDIT: fixed source code link. |
@brendandahl: Could you have a look? |
Brendan is hiking at the moment, so perhaps @yurydelendik can take a look at this? |
@p01: Maybe you can have a look? It's probably not that complicated |
Oy! |
|
The link in the first comment is dead. Does anybody have a copy of the document? |
This change avoids the element stringification caused by for..in for the vast majority of CMaps. When loading the PDF from issue mozilla#4580, this change reduces peak RSS from ~650 to ~600 MiB, and improves overall speed by ~20%, from 902 ms to 713 ms. Other CMap-heavy documents will also see improvements.
This change avoids the element stringification caused by for..in for the vast majority of CMaps. When loading the PDF from issue mozilla#4580, this change reduces peak RSS from ~650 to ~600 MiB, and improves overall speed by ~20%, from 902 ms to 713 ms. Other CMap-heavy documents will also see improvements.
cid chars are 16-bit unsigned integers. Currently we convert them to single-char strings when inserting them into the CMap, and then convert them back to integers when extracting them from the CMap. This patch changes CMap so that cid chars stay in integer format throughout, saving both time and space. When loading the PDF from issue mozilla#4580, this change reduces peak RSS from ~600 to ~370 MiB. It also improves overall speed on that PDF by ~26%, going from 724 ms to 533 ms.
cid chars are 16-bit unsigned integers. Currently we convert them to single-char strings when inserting them into the CMap, and then convert them back to integers when extracting them from the CMap. This patch changes CMap so that cid chars stay in integer format throughout, saving both time and space. When loading the PDF from issue mozilla#4580, this change reduces peak RSS from ~600 to ~370 MiB. It also improves overall speed on that PDF by ~26%, going from 724 ms to 533 ms.
This change avoids the element stringification caused by for..in for the vast majority of CMaps. When loading the PDF from issue mozilla#4580, this change reduces peak RSS from ~650 to ~600 MiB, and improves overall speed by ~20%, from 902 ms to 713 ms. Other CMap-heavy documents will also see improvements.
IdentityCMap uses an array to represent a 16-bit unsigned identity function. This is very space-inefficient, and some files cause multiple IdentityCMaps to be instantiated (e.g. the one from mozilla#4580 has 74). This patch make the representation implicit. When loading the PDF from issue mozilla#4580, this change reduces peak RSS from ~370 to ~280 MiB. It also improves overall speed on that PDF by ~30%, going from 522 ms to 366 ms.
Closing as fixed by #6590. |
PDF.js is unusably slow when viewing this pdf:
http://www.grapes-trams.org.cn/UploadFile/files/%E5%88%A9%E7%94%A8%E4%B8%80%E4%B8%AA%E6%B5%B7%E6%B0%94%E8%80%A6%E5%90%88%E6%A8%A1%E5%BC%8F%E5%AF%B9%E5%8F%B0%E9%A3%8EKrovanh%E7%9A%84%E6%A8%A1%E6%8B%9F.pdf
The text was updated successfully, but these errors were encountered: