Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leak / infinite loop scenario #1174

Closed
ajakubo1 opened this issue Jul 27, 2020 · 5 comments
Closed

Potential memory leak / infinite loop scenario #1174

ajakubo1 opened this issue Jul 27, 2020 · 5 comments

Comments

@ajakubo1
Copy link

So I have a problem which is quite similar to #923 . There is an HTML document that is causing CPU usage to increase to 100% and causes memory usage issues when I call .write_pdf function on it.

I have actually spent the last few days to create a quasi-minimal HTML document with cleaned data to reproduce the problem and share it here, and this is it:

leak.txt

Some of the css markers in there might not be fully needed to reproduce the problem. I hope you enjoy the images I've included :). The thing which is important to replicate the issue is that those images are of certain sizes (I'm not sure if it's related only to width, or to both width and height of the images).

The problem occurs on an env with latest WeasyPrint installed:

pip freeze
cairocffi==1.1.0
CairoSVG==2.4.2
cffi==1.14.1
cssselect2==0.3.0
defusedxml==0.6.0
html5lib==1.1
Pillow==7.2.0
pkg-resources==0.0.0
pycparser==2.20
Pyphen==0.9.5
six==1.15.0
tinycss2==1.0.2
WeasyPrint==51
webencodings==0.5.1

python version I've tested it on is 3.6.9. I have also tested it on python 3.8.4 with the same results (production env is running on that one).

OS is ubuntu 18.04:

 uname -a
Linux ubuntu-bionic 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

I have managed to "fix" the problem currently by setting the width of table element to 95% instead of 100%.

Steps to replicate the problem in python console:

content = """<content from the .txt file attached to this bug report>"""
from weasyprint import HTML
from tempfile import NamedTemporaryFile
html = HTML(string=content, encoding="utf-8")
temp_file = NamedTemporaryFile()
html.write_pdf(target=temp_file)

Maybe there's some way of raising an error when an infinite loop of operations is detected (e.g. limiting the number of calls that can be made for certain function) or some other fail-safe mechanism that will not allow memory usage to rise? I understand that this is quite a specific (and pretty weird) edge case, I'm a bit worried that there are more of those around though.

@Tontyna
Copy link
Contributor

Tontyna commented Jul 28, 2020

This bug is kind of fixed by da146c6 in the master branch.

Instead of an infinite loop WeasyPrint now generates two pages for the table: First page with the table header only, on the second page there is the header and the images.

Reason is: The row with the images and the header above is higher than the page and WeasyPrint is unable to paginate/split rows which cross the page margin see #36

When a row doesn't fit on the current page, WeasyPrint pushes the row onto the next page. But the header is already on the page just generated.

Before da146c6 WeasyPrint never ceased to push the large row forward to the next page...

@ajakubo1
Copy link
Author

Thanks! I will try to reproduce it on master and get back to you! It sounds about right. Such a crazy error.

When do you think the next version will be published?

@Tontyna
Copy link
Contributor

Tontyna commented Jul 28, 2020

Even with the master branch and no infinite loop you won't be happy with your table and its <thead> -- it's sheer luck, when not-so-simple tables render as expected.

@ajakubo1
Copy link
Author

I can confirm that the version which is on master currently does not have this problem.

Well, it might not look pretty, but at least it doesn't kill the machine. I'd consider releasing what you currently have (if the master is stable).

@liZe
Copy link
Member

liZe commented Jul 29, 2020

Thanks to both of you!

@liZe liZe closed this as completed Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants