Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding support to split the pdf pages only on "white" rows #41

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ilanp13
Copy link

@ilanp13 ilanp13 commented Oct 16, 2017

No description provided.

@ilanp13
Copy link
Author

ilanp13 commented Oct 16, 2017

hello,

i've added support to your library for splitting pages only on "white" canvas "rows", added comments on the code itself.

@ilanp13
Copy link
Author

ilanp13 commented Oct 16, 2017

i believe it will solve the issues:
#26
#13

Copy link
Owner

@eKoopmans eKoopmans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thank you for your submission @ilanp13! It's a creative solution, scanning for white lines. I like it - definitely some things to sort out, but I'd like to work your solution in.

pageStartPos += h;
// Add hyperlinks.
if (opt.enableLinks) {
var pageTop = page * pageSize.inner.height;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pageTop would also need to be updated to use pageStartPos (with unit conversion).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

src/html2pdf.js Outdated
var pixelData = ctx.getImageData(wPix, pageStartPos + h, 1, 1).data;
if (pixelData[0] < whiteColorLevel || pixelData[1] < whiteColorLevel || pixelData[2] < whiteColorLevel) {
bPixCount++;
if (bPixCount > 1) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for allowing up to 1 pixel to have 1 RGB value below the threshold?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason - even though my page is with white bg, the getImageData function always thinks that the last pixel on the row is black, so that one is special case for me.
maybe add parameter of how much "not white" pixel one allow to have on a row? (for example - if one have border in his page - he will allow 2 pixels)

src/html2pdf.js Outdated
var wPix = 0;
var bPixCount = 0;

while (wPix < w && whiteRow) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scan could be optimised, e.g. using getImageData on the entire row.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't have the time to optimize it - so if you have an idea of how to optimise it - please do!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've commited optimized solution for the white row checking

src/html2pdf.js Outdated
heightDrawBack++;
}
while (!whiteRow && heightDrawBack < maxPixelToGoUp) ;
heightDrawBack--;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, if no white row was found, it looks like heightDrawBack will be at maxPixelToGoUp. Doesn't it need to be reset to 0 if !whiteRow?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean, that if it didn't find white row up to heightDrawBack, it might as well cut the page on the first row?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if it still hasn't found a white row then it might as well use as much of the page as possible.

if (page) pdf.addPage();
var imgData = pageCanvas.toDataURL('image/' + opt.image.type, opt.image.quality);
pdf.addImage(imgData, opt.image.type, opt.margin[1], opt.margin[0],
pageSize.inner.width, pageHeight);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth altering pageHeight so that only the top (non-white) portion of the image is copied to the PDF, to reduce file size (same reason the very last page is trimmed).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

src/html2pdf.js Outdated
* if the row isn't "white" - will go one pixel up and test that row, up to max. number of maxPixelToGoUp rows up
*
*/
var whiteColorLevel = 150;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user's background color isn't white?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add an option for background color value, and then if it is set - then the check will be for complete row of this value, instead of white (or rather, higher than whiteColorLevel value)?

src/html2pdf.js Outdated
var pdf = new jsPDF(opt.jsPDF);

/**
* added support for page split only on white "rows".
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be an "option" (configurable in html2pdf settings) rather than hard-coded to always run. That way background color, max pixels, etc. could be configured as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea - also maybe add the option to decide what level is for "white" (the 150 value i put in)

clientRect.left -= containerRect.left;
clientRect.top -= containerRect.top;
opt.links.push({ el: link, clientRect: clientRect });
var html2pdf = (function (html2canvas, jsPDF) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to avoid changing the indenting, it confuses the "diff" engine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, i'll try to not change the indenting next time

@ilanp13
Copy link
Author

ilanp13 commented Oct 18, 2017

It'll will take some time until i'll be able to work on this changes, will update when will be done.

@eKoopmans
Copy link
Owner

Sure no worries, thanks again for the contribution!

@eKoopmans
Copy link
Owner

Hey @ilanp13, sorry I left you hanging. The changes look good! I just did a big restructuring to the code with v0.8.0, I'll take a look at working this in soon. I'm thinking I'll pull the page-break behaviour out into a separate module. Thanks again for the contribution!

@ScottStevenson
Copy link

Hey folks. Is there any update on this PR? Would really love to use this feature. IMO it makes the difference between a professional looking output and not.

@ngchen1
Copy link

ngchen1 commented May 18, 2018

Hi @ilanp13 , Thank you for the code. It's working in some scenarios for me. But I am trying to understand what the variable whiteColorLevel does here. I have a Uint8ClampedArray and in the pixel row, the first pixel RGBA values are 208,208,208,255, which is close to grey. But the method still returns true. What exactly is whiteColorLevel. Can you please explain me?
capture1
capture

@eKoopmans
Copy link
Owner

Hello! There are now 4 PRs for pagebreaks, and I know it's an important feature to many. I am looking into combining the PRs now and working them in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants