Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WeasyPrint performance #223

Closed
sparrovv opened this issue Oct 16, 2014 · 2 comments
Closed

WeasyPrint performance #223

sparrovv opened this issue Oct 16, 2014 · 2 comments

Comments

@sparrovv
Copy link

Sorry if this is not the place to ask this question, but I've noticed that Weasyprint is really slow on my test server:

Amazon Linux AMI release 2014.03 
(m1.small with 1 vCPU)
Python 2.6.9 (unknown, Sep 13 2014, 00:25:11)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2

When I generate a simple PDF it takes ~ 9s

time (echo '<h1>fooobar</h1>' | weasyprint -f pdf - - > foo.pdf)

real    0m8.989s
user    0m1.992s
sys     0m2.125s

Is there something I can do to improve performance, without upgrading python or changing the size of the box?

@SimonSapin
Copy link
Member

(By the way, your command can be simplified to time (echo '<h1>fooobar</h1>' | weasyprint - foo.pdf), the file type is inferred from the output filename extension.)

WeasyPrint is not known to be fast (compared, say, to WebKit), but this is unusually slow indeed. What kind of storage hardware does this server have? Do you also see this problem on your local machine?

How much of it is start-up time? Compare with time weasyprint --help. At startup, WeasyPrint not only imports a lot of Python modules but also parses its user-agent stylesheet, and sets up CFFI which involves parsing C headers.

The difference between "real" (wall-clock) and "user" (CPU in userspace) time indicates that a lot of it might be I/O. Compare run times of both commands with cold and warm filesystem cache by running them repeatedly. (See this answer about dropping the cache.) In my laptop (relatively fast, with an SSD), this command takes ~1.5s after dropping the cache and ~0.9s for repeated runs.

You can also run a profiler and see where the time is actually spent:

echo '<h1>fooobar</h1>' | python -m cProfile -o ./profile $(which weasyprint) - foo.pdf
gprof2dot -f pstats ./profile | xdot -

The second command is to visualize the profile in a graphical way. You’ll probably need to install gprof2dot and xdot. I suppose you’ll also want to run that on your local machine rather than the server, after transferring the profile file. Note that profiling itself takes time, so running with cProfile will likely be slower than without. If you upload the profile somewhere I can have a look at it.

Finally, if startup time is an issue for you in a real application, consider writing some kind of server in Python and using the Python API to have long-running WeasyPrint processes rather than spawning a new process for every document.

@sparrovv
Copy link
Author

Thanks @SimonSapin for you comprehensive response.

I'm not having that problem on my local machine, it's only happens on a very basic EC2 instance.

As you mentioned the start-up time seems to be problem. Executing time weasyprint --help takes about ~3.5s. Fortunately my production machines are much bigger and they have SSDs on board, so generating PDFs takes there less than 2s.

Also having a long-running WeasyPrint process seems to be a good idea, so I'll give it a shot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants