-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding support to split the pdf pages only on "white" rows #41
base: main
Are you sure you want to change the base?
Conversation
hello, i've added support to your library for splitting pages only on "white" canvas "rows", added comments on the code itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, thank you for your submission @ilanp13! It's a creative solution, scanning for white lines. I like it - definitely some things to sort out, but I'd like to work your solution in.
pageStartPos += h; | ||
// Add hyperlinks. | ||
if (opt.enableLinks) { | ||
var pageTop = page * pageSize.inner.height; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pageTop
would also need to be updated to use pageStartPos
(with unit conversion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
src/html2pdf.js
Outdated
var pixelData = ctx.getImageData(wPix, pageStartPos + h, 1, 1).data; | ||
if (pixelData[0] < whiteColorLevel || pixelData[1] < whiteColorLevel || pixelData[2] < whiteColorLevel) { | ||
bPixCount++; | ||
if (bPixCount > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for allowing up to 1 pixel to have 1 RGB value below the threshold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some reason - even though my page is with white bg, the getImageData function always thinks that the last pixel on the row is black, so that one is special case for me.
maybe add parameter of how much "not white" pixel one allow to have on a row? (for example - if one have border in his page - he will allow 2 pixels)
src/html2pdf.js
Outdated
var wPix = 0; | ||
var bPixCount = 0; | ||
|
||
while (wPix < w && whiteRow) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scan could be optimised, e.g. using getImageData
on the entire row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't have the time to optimize it - so if you have an idea of how to optimise it - please do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've commited optimized solution for the white row checking
src/html2pdf.js
Outdated
heightDrawBack++; | ||
} | ||
while (!whiteRow && heightDrawBack < maxPixelToGoUp) ; | ||
heightDrawBack--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, if no white row was found, it looks like heightDrawBack
will be at maxPixelToGoUp
. Doesn't it need to be reset to 0 if !whiteRow
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean, that if it didn't find white row up to heightDrawBack, it might as well cut the page on the first row?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if it still hasn't found a white row then it might as well use as much of the page as possible.
if (page) pdf.addPage(); | ||
var imgData = pageCanvas.toDataURL('image/' + opt.image.type, opt.image.quality); | ||
pdf.addImage(imgData, opt.image.type, opt.margin[1], opt.margin[0], | ||
pageSize.inner.width, pageHeight); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be worth altering pageHeight
so that only the top (non-white) portion of the image is copied to the PDF, to reduce file size (same reason the very last page is trimmed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
src/html2pdf.js
Outdated
* if the row isn't "white" - will go one pixel up and test that row, up to max. number of maxPixelToGoUp rows up | ||
* | ||
*/ | ||
var whiteColorLevel = 150; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the user's background color isn't white?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add an option for background color value, and then if it is set - then the check will be for complete row of this value, instead of white (or rather, higher than whiteColorLevel value)?
src/html2pdf.js
Outdated
var pdf = new jsPDF(opt.jsPDF); | ||
|
||
/** | ||
* added support for page split only on white "rows". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be an "option" (configurable in html2pdf settings) rather than hard-coded to always run. That way background color, max pixels, etc. could be configured as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great idea - also maybe add the option to decide what level is for "white" (the 150 value i put in)
clientRect.left -= containerRect.left; | ||
clientRect.top -= containerRect.top; | ||
opt.links.push({ el: link, clientRect: clientRect }); | ||
var html2pdf = (function (html2canvas, jsPDF) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to avoid changing the indenting, it confuses the "diff" engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, i'll try to not change the indenting next time
It'll will take some time until i'll be able to work on this changes, will update when will be done. |
Sure no worries, thanks again for the contribution! |
Hey @ilanp13, sorry I left you hanging. The changes look good! I just did a big restructuring to the code with v0.8.0, I'll take a look at working this in soon. I'm thinking I'll pull the page-break behaviour out into a separate module. Thanks again for the contribution! |
Hey folks. Is there any update on this PR? Would really love to use this feature. IMO it makes the difference between a professional looking output and not. |
Hi @ilanp13 , Thank you for the code. It's working in some scenarios for me. But I am trying to understand what the variable whiteColorLevel does here. I have a Uint8ClampedArray and in the pixel row, the first pixel RGBA values are 208,208,208,255, which is close to grey. But the method still returns true. What exactly is whiteColorLevel. Can you please explain me? |
Hello! There are now 4 PRs for pagebreaks, and I know it's an important feature to many. I am looking into combining the PRs now and working them in. |
No description provided.