-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poorly rendered image quality #7041
Comments
What exactly is wrong with the PDF from viewer.html? |
Hi Rob, Thanks for your reply. The problem is the text quality (I'm not sure if it's strictly font rendering or not) makes it a bit harder to read the words on it. Check out the word Billing at number 12, for example. The one on the right looks like it says "Bllllng". My users need to be able to see at least 8 pages at the same time, as they go through thousands every day. This means they're going to view the pdfs with an even smaller zoom level. The one on the left is not at 100%. It's actually at a smaller percentage than 80%. Just look at the sizes. The one on the left showed number 1-28 on the same screen, while the right one only showed 25. |
Could you paste a screenshot with exactly the same zoom sizes? The "text rendering" is not a text rendering issue. What you see is an image. The question is whether the quality deteriotated after scaling (and of course, whether it can be improved). |
Thanks for your extra info. I don't know what goes wrong, but maybe some of the other PDF.js devs know. |
I think the loss of image quality happens in canvas.js#L1895, where the Jpeg image of dimensions 2508 x 3525 is drawn onto a canvas that is much smaller than the image. Images look poor when they are scaled down by a large factor at once, which is fixed in canvas.js#L2081 (from #3312). The same fix was applied to thumbnails in #4924. So I guess it should be fixed here for Jpeg images as well. |
The PDFs I'm rendering mostly contain text as majority of the content. The same PDFs are displayed perfectly in an iframe on IE + Adobe plugin, or IE + Foxit plugin. This is despite the iframe's dimensions were set to 360 x 400 (much smaller than the actual size). While on Chrome, using both the default viewer and PdfViewer extension, they look really distorted. Adobe and Foxit use their own engine to render the PDF, while most Chrome extensions translate the PDF contents into html elements. Problem is, Adobe & Foxit draw their whole interface on a different layer on top of the html, disabling my application from displaying context menu and other things over the PDFs. I'm still hoping I can use pdf.js for my app. But right now, I'm forced to try a different solution. |
@nicholassinggih The pdf you provided contains a (scanned?) jpeg image in the background. The text that you can select in the document is an invisible text overlay. I pushed a commit to fkaelberer@913d3bc, which downscales the jpegs in multiple steps, in each step with a factor of <= 2x. As a result, the readability is increased a lot, see images below. I did not open a pull request, though, because the code is unfinished and not tested much. Anyone, please feel free take and improve the code.
|
Hi Felix, Thank you very much for this. I've briefly tested your new code and it does give a much better result, just as shown in your screenshots. Bravo to you!! Forgive me if my next questions sound silly, but I just want to clarify if I understand this correctly. About the text vs jpeg image thing, are you saying that the PDF that I provided actually only contain jpeg of a scanned paper with text on it? Therefore, it doesn't actually contain text data and it was rendered as an image which in turn was scaled down by the browser? Should I close this thread, or let either Yuri or Tim to close this? |
Yes. That's why I slapped the jpeg label on this ticket.
It does contain text data (with a transparent color, probably for text selection), but what you see (and what is printed) is the (scaled) image.
The (legitimate) issue hasn't been resolved yet, so I'd keep the issue open. |
I see. They ran OCR on the PDF, thus the transparent/invisible text data. But, if they hadn't ran OCR on it, I'm guessing it would just contain a jpeg image with no text. Thank you both for your help. Really appreciate it. |
That's a first time I see scanned data was packaged as JPEG (vs JBIG2 or CCITT). We probably didn't see this issue early since it's probably a rare case. As mentioned in #7041 (comment) above, we already do it for most of the images except JPEG (e.g. https://github.com/mozilla/pdf.js/blob/master/src/display/canvas.js#L2081) Trade-off is more memory and CPU is used. If we will decide to move decoding of the JPEG to the worker side then we don't have to worry about code duplication. |
It does not only affect text, but also images http://www.ikea.com/at/de/assembly_instructions/rakke-kleiderschrank__AA-808506-1.pdf (notice the thumbnails) |
Hi Felix or anyone, Is there a chance that you could implement a better downsampling algorithm in the paintJpegXObject methods? Some of my users are still complaining about the quality of the images. Even with the gradual scaling that Felix added, scanned images (JPEG) of handwritten documents are sometimes too faded or blurry too read. Volume 1 of 4 43_Redacted.pdf This is all I know about the user's machine: I am currently trying to implement Lanczos or sinc downscaling algorithm in the paintJpegXObject. But I don't know how bad it will affect the performance, and not sure if it is going to fix the problem. Really appreciate your help. |
@nicholassinggih if you will find out algorithm that we can use for downsampling and it works, we will provide pointers on how to improve its performance (e.g via asm.js). |
@yurydelendik I've tried implementing bicubic interpolation instead to scale down the jpegs. The result is not as good as @fkaelberer 's solution, not to mention that the performance is also slower. This problem proves to be the most difficult challenge in the process of developing the current application. I'm currently looking into the possibility of just creating a modified copy of the PDF file with all the pages and images resized to the desired dimensions. As my users don't actually dynamically zoom in and out when viewing the pages, this could work. |
var options = options || { increase the scale you can see improved clarity |
hey guys, not sure if this is applicable in this case, i have been getting blurry text as well from the rendered canvas, and i have been wondering if this is rather html5 canvas issue than pdf.js, till i stumbled on: before accounting for |
@wotzhs yes, pdf.js demo viewer relies on devicePixelRatio to increase pixel density per pixel via CSS. |
Is it a dup for #2750? |
@yurydelendik I am confused. Are you saying that the existing pdfjs should handle the devicePixelRatio as that article suggested? If so, you don't happen to know when it started doing that, do you? I am seeing blurriness with wix where they seem to be using pdfjs-dist version 2.305 from Feb 1, 2018, and wondering if they need to either upgrade or make use of the devicePixelRatio stuff in their use of the library. |
Hi all,
I have tried using the latest 1.3.91 version for this. All I did was modify the viewer.js to point to my own PDF. This is the result using the fresh out of the box viewer.html :
The left one is the actual PDF, and the right one is rendered by viewer.html. Same thing in IE and Chrome. I did not try it in FF because my users use either IE or Chrome.
I've tried searching everywhere I could and have been unsuccessful in finding a solution to this. Here's the PDF file from the picture.
.0002.pdf
The text was updated successfully, but these errors were encountered: