Poorly rendered image quality #7041

nicholassinggih · 2016-02-29T02:34:44Z

Hi all,

I have tried using the latest 1.3.91 version for this. All I did was modify the viewer.js to point to my own PDF. This is the result using the fresh out of the box viewer.html :

The left one is the actual PDF, and the right one is rendered by viewer.html. Same thing in IE and Chrome. I did not try it in FF because my users use either IE or Chrome.

I've tried searching everywhere I could and have been unsuccessful in finding a solution to this. Here's the PDF file from the picture.
.0002.pdf

Rob--W · 2016-02-29T15:03:22Z

What exactly is wrong with the PDF from viewer.html?
(the zoom levels are different by the way, the one at the left uses 100% zoom, the one on the right 80%).

nicholassinggih · 2016-02-29T19:30:44Z

Hi Rob,

Thanks for your reply.

The problem is the text quality (I'm not sure if it's strictly font rendering or not) makes it a bit harder to read the words on it. Check out the word Billing at number 12, for example. The one on the right looks like it says "Bllllng". My users need to be able to see at least 8 pages at the same time, as they go through thousands every day. This means they're going to view the pdfs with an even smaller zoom level.

The one on the left is not at 100%. It's actually at a smaller percentage than 80%. Just look at the sizes. The one on the left showed number 1-28 on the same screen, while the right one only showed 25.

Rob--W · 2016-02-29T19:53:27Z

Could you paste a screenshot with exactly the same zoom sizes?

The "text rendering" is not a text rendering issue. What you see is an image. The question is whether the quality deteriotated after scaling (and of course, whether it can be improved).

nicholassinggih · 2016-02-29T20:20:06Z

Here's from the viewer at 100%. Notice that the first 'i' in the word Billing still looks like an 'l'. And there are white pixels every here and there on all the letters.

And this next one is at 100% in Adobe Reader. The text looks smoother & cleaner.

Here's both of them at 50%:

Here's at 60% in the viewer.html and still 50% in Adobe Reader:

Even at 70% in the viewer, it's still easier to read from Adobe at 50%: Notice that you can clearly discern on Adobe that the page number is 2. While on the viewer, it looks more like a '3'. Letters 'e' looks a bit like 'o', and 'i' looks like 'l'. The header looks thinner and distorted, too. And the footer is really hard to read.

Rob--W · 2016-02-29T21:52:25Z

Thanks for your extra info. I don't know what goes wrong, but maybe some of the other PDF.js devs know.

fkaelberer · 2016-03-03T22:13:23Z

I think the loss of image quality happens in canvas.js#L1895, where the Jpeg image of dimensions 2508 x 3525 is drawn onto a canvas that is much smaller than the image. Images look poor when they are scaled down by a large factor at once, which is fixed in canvas.js#L2081 (from #3312). The same fix was applied to thumbnails in #4924. So I guess it should be fixed here for Jpeg images as well.

nicholassinggih · 2016-03-04T17:28:31Z

The PDFs I'm rendering mostly contain text as majority of the content.

The same PDFs are displayed perfectly in an iframe on IE + Adobe plugin, or IE + Foxit plugin. This is despite the iframe's dimensions were set to 360 x 400 (much smaller than the actual size). While on Chrome, using both the default viewer and PdfViewer extension, they look really distorted.

Adobe and Foxit use their own engine to render the PDF, while most Chrome extensions translate the PDF contents into html elements. Problem is, Adobe & Foxit draw their whole interface on a different layer on top of the html, disabling my application from displaying context menu and other things over the PDFs.

I'm still hoping I can use pdf.js for my app. But right now, I'm forced to try a different solution.

fkaelberer · 2016-03-04T22:20:53Z

The PDFs I'm rendering mostly contain text as majority of the content.

@nicholassinggih The pdf you provided contains a (scanned?) jpeg image in the background. The text that you can select in the document is an invisible text overlay.
The images are downscaled by the browser (not by pdf.js's javascript code), so image quality may vary with the browser or OS. Firefox and Chrome (maybe others too) do a terrible job of downscaling the big images to small canvases, thus causing the bad image quality.

I pushed a commit to fkaelberer@913d3bc, which downscales the jpegs in multiple steps, in each step with a factor of <= 2x. As a result, the readability is increased a lot, see images below.

I did not open a pull request, though, because the code is unfinished and not tested much. Anyone, please feel free take and improve the code.

better readabilty / image quality
figure out if / how it works if image does not fill the whole canvas
Fix blurry issue1350
Bonus: deduplicate the downscaling code
Check out if blurryness of some images can be reduced (red font in issue2642 looks better without this patch)

At 50%:

At 70%:

nicholassinggih · 2016-03-06T08:53:46Z

Hi Felix,

Thank you very much for this. I've briefly tested your new code and it does give a much better result, just as shown in your screenshots. Bravo to you!!

Forgive me if my next questions sound silly, but I just want to clarify if I understand this correctly. About the text vs jpeg image thing, are you saying that the PDF that I provided actually only contain jpeg of a scanned paper with text on it? Therefore, it doesn't actually contain text data and it was rendered as an image which in turn was scaled down by the browser?

Should I close this thread, or let either Yuri or Tim to close this?

Rob--W · 2016-03-06T11:57:05Z

About the text vs jpeg image thing, are you saying that the PDF that I provided actually only contain jpeg of a scanned paper with text on it?

Yes. That's why I slapped the jpeg label on this ticket.

Therefore, it doesn't actually contain text data and it was rendered as an image which in turn was scaled down by the browser?

It does contain text data (with a transparent color, probably for text selection), but what you see (and what is printed) is the (scaled) image.

Should I close this thread, or let either Yuri or Tim to close this?

The (legitimate) issue hasn't been resolved yet, so I'd keep the issue open.

nicholassinggih · 2016-03-06T20:20:20Z

It does contain text data (with a transparent color, probably for text selection), but what you see (and what is printed) is the (scaled) image.

I see. They ran OCR on the PDF, thus the transparent/invisible text data. But, if they hadn't ran OCR on it, I'm guessing it would just contain a jpeg image with no text.

Thank you both for your help. Really appreciate it.

yurydelendik · 2016-03-08T19:26:24Z

That's a first time I see scanned data was packaged as JPEG (vs JBIG2 or CCITT). We probably didn't see this issue early since it's probably a rare case. As mentioned in #7041 (comment) above, we already do it for most of the images except JPEG (e.g. https://github.com/mozilla/pdf.js/blob/master/src/display/canvas.js#L2081) Trade-off is more memory and CPU is used. If we will decide to move decoding of the JPEG to the worker side then we don't have to worry about code duplication.

fkaelberer · 2016-03-08T20:14:34Z

That's a first time I see scanned data was packaged as JPEG (vs JBIG2 or CCITT). We probably didn't see this issue early since it's probably a rare case.

It does not only affect text, but also images
From #2739:

http://www.ikea.com/at/de/assembly_instructions/rakke-kleiderschrank__AA-808506-1.pdf (notice the thumbnails)

nicholassinggih · 2016-12-12T22:42:47Z

Hi Felix or anyone,

Is there a chance that you could implement a better downsampling algorithm in the paintJpegXObject methods? Some of my users are still complaining about the quality of the images. Even with the gradual scaling that Felix added, scanned images (JPEG) of handwritten documents are sometimes too faded or blurry too read.

Volume 1 of 4 43_Redacted.pdf
Volume 1 of 4 6_Redacted.pdf

This is all I know about the user's machine:
Windows 7 SP1
Browser: Google Chrome 54.0.2840.71 m
I used Felix's version fkaelberer@913d3bc
Graphics card: GeForce GT610
Monitor's resolution 1680 x 1050

I am currently trying to implement Lanczos or sinc downscaling algorithm in the paintJpegXObject. But I don't know how bad it will affect the performance, and not sure if it is going to fix the problem.

Really appreciate your help.

yurydelendik · 2016-12-12T23:41:22Z

But I don't know how bad it will affect the performance, and not sure if it is going to fix the problem.

@nicholassinggih if you will find out algorithm that we can use for downsampling and it works, we will provide pointers on how to improve its performance (e.g via asm.js).

nicholassinggih · 2017-01-19T21:07:32Z

@yurydelendik I've tried implementing bicubic interpolation instead to scale down the jpegs. The result is not as good as @fkaelberer 's solution, not to mention that the performance is also slower.

This problem proves to be the most difficult challenge in the process of developing the current application. I'm currently looking into the possibility of just creating a modified copy of the PDF file with all the pages and images resized to the desired dimensions. As my users don't actually dynamically zoom in and out when viewing the pages, this could work.

natarajnattu · 2017-12-21T07:10:27Z

var options = options || {
scale: 1
};

increase the scale you can see improved clarity

wotzhs · 2018-02-13T07:05:45Z

hey guys, not sure if this is applicable in this case, i have been getting blurry text as well from the rendered canvas, and i have been wondering if this is rather html5 canvas issue than pdf.js, till i stumbled on:
https://www.html5rocks.com/en/tutorials/canvas/hidpi/

before accounting for window.devicePixelRatio. the pdf looks like this:

after accounting for window.devicePixelRatio, the pdf looks like this:

yurydelendik · 2018-02-13T21:11:17Z

@wotzhs yes, pdf.js demo viewer relies on devicePixelRatio to increase pixel density per pixel via CSS.

532910 · 2018-04-12T23:00:48Z

Is it a dup for #2750?

nowherenearithaca · 2018-07-15T02:13:54Z

@yurydelendik I am confused. Are you saying that the existing pdfjs should handle the devicePixelRatio as that article suggested? If so, you don't happen to know when it started doing that, do you? I am seeing blurriness with wix where they seem to be using pdfjs-dist version 2.305 from Feb 1, 2018, and wondering if they need to either upgrade or make use of the devicePixelRatio stuff in their use of the library.

Rob--W added the information-requested label Feb 29, 2016

Rob--W added image-quality image-jpeg and removed information-requested labels Feb 29, 2016

yurydelendik mentioned this issue Mar 21, 2016

Resize large color image to improve rendering time #7095

Closed

yurydelendik mentioned this issue May 3, 2016

clipping paths / bitmaps rendered poorly on chrome #7286

Closed

fkaelberer mentioned this issue Jul 14, 2017

Performance optimizations #8650

Closed

mozilla deleted a comment Dec 31, 2017

Snuffleupagus mentioned this issue Apr 28, 2020

[api-minor] Decode all JPEG images with the built-in PDF.js decoder in src/core/jpg.js #11601

Merged

timvandermeij closed this as completed in #11601 May 23, 2020

Snuffleupagus mentioned this issue Apr 17, 2021

Blurry graph in pdf #9648

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poorly rendered image quality #7041

Poorly rendered image quality #7041

nicholassinggih commented Feb 29, 2016

Rob--W commented Feb 29, 2016

nicholassinggih commented Feb 29, 2016

Rob--W commented Feb 29, 2016

nicholassinggih commented Feb 29, 2016

Rob--W commented Feb 29, 2016

fkaelberer commented Mar 3, 2016

nicholassinggih commented Mar 4, 2016

fkaelberer commented Mar 4, 2016

nicholassinggih commented Mar 6, 2016

Rob--W commented Mar 6, 2016

nicholassinggih commented Mar 6, 2016

yurydelendik commented Mar 8, 2016

fkaelberer commented Mar 8, 2016

nicholassinggih commented Dec 12, 2016

yurydelendik commented Dec 12, 2016

nicholassinggih commented Jan 19, 2017

natarajnattu commented Dec 21, 2017

wotzhs commented Feb 13, 2018 •

edited

Loading

yurydelendik commented Feb 13, 2018

532910 commented Apr 12, 2018

nowherenearithaca commented Jul 15, 2018

Poorly rendered image quality #7041

Poorly rendered image quality #7041

Comments

nicholassinggih commented Feb 29, 2016

Rob--W commented Feb 29, 2016

nicholassinggih commented Feb 29, 2016

Rob--W commented Feb 29, 2016

nicholassinggih commented Feb 29, 2016

Rob--W commented Feb 29, 2016

fkaelberer commented Mar 3, 2016

nicholassinggih commented Mar 4, 2016

fkaelberer commented Mar 4, 2016

nicholassinggih commented Mar 6, 2016

Rob--W commented Mar 6, 2016

nicholassinggih commented Mar 6, 2016

yurydelendik commented Mar 8, 2016

fkaelberer commented Mar 8, 2016

nicholassinggih commented Dec 12, 2016

yurydelendik commented Dec 12, 2016

nicholassinggih commented Jan 19, 2017

natarajnattu commented Dec 21, 2017

wotzhs commented Feb 13, 2018 • edited Loading

yurydelendik commented Feb 13, 2018

532910 commented Apr 12, 2018

nowherenearithaca commented Jul 15, 2018

wotzhs commented Feb 13, 2018 •

edited

Loading