-
-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I can attach files via command line api but not the python api? #644
Comments
python3 and 0.42.3 BTW :) |
I can see this in the generated pdf, which I think might be comign from the link tag(s) in the html?
using this code...
I get no errors, a pdf of the html but without the attachments included. When I use -a I get a pdf with the attachments. I'm generating hundreds of pdfs at a time, and to get around the startup time of weasyprint I have made a zerorpc server to keep running which then gets called to make the pdfs, it runs many times faster this way but I'm having to use pdfunite to get around this attachments problem and then pdfinfo to count the pages. I'm guessing all of that should be possible with the weasyprint python API? Just I'm probably doing something daft... |
No problem here. Neither with the current master branch nor with a version from last year. Since the attachment is in the PDF and since your PDF reader can display the attached PDF when attached via Could you provide a PDF where you cannot see the attached PDF? |
Hmmm, I was hoping I was being obviously stupid. So the fact it works with -a and not with python for me is an odd edge case then? Not got time right now, but I'll provide a couple of pdfs, one made on the command line with -a and one with python, using the exact same sources and see if that helps to isolate what's going on. |
@tjrhodes the problem may come from the relative paths you're using. When you're using '-a', the path is relative to your current folder, but when you're using a You should try to use absolute path in your Python example and see if it works. |
@liZe: I suspected the same, but @tjrhodes said:
Also, if WeasyPrint cannot find an attachment it raises a warning and doesnt put an |
Hey, ok here you go, I'm now consistently getting it to fail silently through the command line and the python API. Pretty sure -a was working for me before though. I've been working around it with pdfunite for a while so maybe I got confused there. These pdfs were produced from html without the tag. https://cloud.tjrhodes.com/index.php/s/0YBWa0GXVupFeiK In there you have the attachment, the html and the two pdfs which seem identical. So I guess the problem now is that I get no attachments either way! No warnings and <<Embedded Files present in the pdfs. Not urgent as I've got pdfunite and pdfinfo to fall back on, but your tool is awesome and I was looking to do everything with one command instead of 3 different commands. Plus using weasyprint with zerorpc to get the speed benefits of the long running process is like 10x faster than doing what I'm doing now. |
As you say: The PDFs are identical.
FlateDecoding the embedded stream reproduces mail_lands.pdf, no error, no damage. Looking at mail_lands.pdf with an editor, all I see on the first glance is: It's PDF-1.4, WeasyPrint produces PDF-1.3, maybe thats the point? Embedding 1.4 in 1.3 upsets the PDF readers? |
Right, interesting, that pdf and lots of others I'm attaching to the weasy generated ones with pdfunite, are created from libreoffice --convert-to on the command line. So I guess I need to look for a way to control the version there. I don't have adobe reader, gnome document viewer shows me the weasy generated content and nothing else. Anyway, thanks for the hint I didn't know where to look for an answer but the 1.4 > 1.3 thing looks very promising, and thanks for the great tool, the results from html are fantastic. |
What I dont get is why a viewer, capable of handling PDF 1.4 isn't able to unpack the attached PDF and detect that he should simply switch his parsing engine from 1.3 to 1.4... but thats what at least 3 viewers seemingly fail to do: Adobe Reader, Sumatra PDF viewer, gnome document viewer. Would you mind changing the title to sth like "problems when embedding PDF 1.4 files"? |
Version 1.3 is set by pdfrw, but Cairo creates PDF files with version 1.5. How does pdfrw transforms 1.5 documents into 1.3? I don't know, and I don't want to. After many, many bugs (#644, #639, #565 and equivalent issues), I think that we should not use pdfrw anymore. Cairo now provides an API to add metadata (including bookmarks I think), there's not much more to handle by editing the PDF file (at least bleeding areas). I'm sad, because pdfrw is really useful and its devs are really nice. But the work needed on CairoSVG and on WeasyPrint is probably less than the work needed to understand and fix these issues using pdfrw. |
Wanted to be shure that the mixed PDF versions are the source of evil. No problem. No error opening the attached file. Conclusion: Its not a PDF version conflict. Something must be wrong with the embedded encoded stream. And indeed: The so called FlateDecoded stream in my working PDF looks completely different than the streams contained in @tjrhodes' PDFs. Working embedded stream:
Failing stream looks like a Pythonic binary string to me:
Was unable to find the place where pdfrw writes the attachment's stream into the PDF and check whether it converts it into the wrong string type. The stream is actually produced by The workaround consequently looks like that: if isinstance(pdf_file_object.stream, bytes):
pdf_file_object.stream = str(pdf_file_object.stream) Will create a PR asap. |
@tjrhodes -- since I cant reproduce the bug: Would you please test whether the PR fixes it? Edit: No need to test, see the ff comments. |
@Tontyna thanks a lot for your investigation! You're right, it's an encoding problem when using Python 3. It's not pdfrw's fault … but we wouldn't have this issue without pdfrw. This issue is actually a duplicate of #558, fixed in ce84073. I've backported ce84073 into the |
I'll release 0.42.4 during the summer 🌞. |
Hi,
Using -a on the command line I can attach pdfs fine, when I try
<link rel="attachment" href="path/to/pdf.pdf">
tags in the html I get no attachments, using relative or absolute paths makes no difference.
I've also tried passing an array of attachments in the python like so...
doc.write_pdf(outputPath,1,attachments)
where doc is returned from an html.render() call. No dice that way either. I think I'm missing something basic, but I've looked through the docs and the issues on here and I can't see why it fails.
Any ideas?
The text was updated successfully, but these errors were encountered: