I can attach files via command line api but not the python api? #644

tjrhodes · 2018-06-16T15:59:24Z

Hi,

Using -a on the command line I can attach pdfs fine, when I try

<link rel="attachment" href="path/to/pdf.pdf">

tags in the html I get no attachments, using relative or absolute paths makes no difference.

I've also tried passing an array of attachments in the python like so...

doc.write_pdf(outputPath,1,attachments)

where doc is returned from an html.render() call. No dice that way either. I think I'm missing something basic, but I've looked through the docs and the issues on here and I can't see why it fails.

Any ideas?

The text was updated successfully, but these errors were encountered:

tjrhodes · 2018-06-16T16:24:32Z

python3 and 0.42.3 BTW :)

tjrhodes · 2018-06-17T09:21:20Z

I can see this in the generated pdf, which I think might be comign from the link tag(s) in the html?

 obj
<</Names
  <</EmbeddedFiles
  <</Names
  [(attachment)
  <</Desc <> /EF <</F 3 0 R>> /F <> /Type /Filespec /UF
  (maidiremail_maggio_2018.pdf)>>]>>>>
  /Outlines <</Count 1 /First 4 0 R /Last 4 0 R /Type /Outlines>> /Pages
  5 0 R /Type /Catalog>>
endobj

using this code...

 self.attachmentPath = "../work/17_6_2018-10_35_02-Test/attachments/163a32b533fd93b7/maidiremail_maggio_2018.pdf"
        self.outputPath = "test.pdf"
        self.htmlPath = "../work/17_6_2018-10_35_02-Test/SEMILIBERTA'_GUBBIOTTI_PAOLO_17_6_2018.html"
        fontConfig = weasyprint_fonts_FontConfiguration()
        cssPath = "../css/mail.css"
        css = weasyprint_CSS(cssPath,None,None,None,None,None,None,None,None,None,fontConfig)
        html = weasyprint_HTML(self.htmlPath)
        attachment = weasyprint_Attachment(filename=self.attachmentPath)
        html.write_pdf(self.outputPath, [css], 1, [attachment]);

I get no errors, a pdf of the html but without the attachments included. When I use -a I get a pdf with the attachments.

I'm generating hundreds of pdfs at a time, and to get around the startup time of weasyprint I have made a zerorpc server to keep running which then gets called to make the pdfs, it runs many times faster this way but I'm having to use pdfunite to get around this attachments problem and then pdfinfo to count the pages. I'm guessing all of that should be possible with the weasyprint python API? Just I'm probably doing something daft...

Tontyna · 2018-06-17T22:22:21Z

No problem here. Neither with the current master branch nor with a version from last year.

Since the attachment is in the PDF and since your PDF reader can display the attached PDF when attached via -a I have no idea what causes your trouble.

Could you provide a PDF where you cannot see the attached PDF?

tjrhodes · 2018-06-19T18:39:29Z

Hmmm, I was hoping I was being obviously stupid. So the fact it works with -a and not with python for me is an odd edge case then? Not got time right now, but I'll provide a couple of pdfs, one made on the command line with -a and one with python, using the exact same sources and see if that helps to isolate what's going on.

liZe · 2018-06-23T07:37:19Z

@tjrhodes the problem may come from the relative paths you're using. When you're using '-a', the path is relative to your current folder, but when you're using a <link> tag, it's relative to the folder the HTML file is in.

You should try to use absolute path in your Python example and see if it works.

Tontyna · 2018-06-23T09:05:16Z

@liZe: I suspected the same, but @tjrhodes said:

I get no attachments, using relative or absolute paths makes no difference.

Also, if WeasyPrint cannot find an attachment it raises a warning and doesnt put an <<Embedded Files object into the PDF.
I'm really keen to see a PDF with attachments that are not attached.

tjrhodes · 2018-06-23T09:33:56Z

Hey, ok here you go, I'm now consistently getting it to fail silently through the command line and the python API. Pretty sure -a was working for me before though. I've been working around it with pdfunite for a while so maybe I got confused there. These pdfs were produced from html without the tag.

https://cloud.tjrhodes.com/index.php/s/0YBWa0GXVupFeiK
password: weasy

In there you have the attachment, the html and the two pdfs which seem identical. So I guess the problem now is that I get no attachments either way! No warnings and <<Embedded Files present in the pdfs.

Not urgent as I've got pdfunite and pdfinfo to fall back on, but your tool is awesome and I was looking to do everything with one command instead of 3 different commands. Plus using weasyprint with zerorpc to get the speed benefits of the long running process is like 10x faster than doing what I'm doing now.

Tontyna · 2018-06-23T11:11:15Z

As you say: The PDFs are identical.
The attached mail_lands.pdf is present in both. But when I try to click-open them I get an ERROR. Which one depends on the PDF reader I'm using. My Adobe Reader says something along the lines of

Couldnt open "mail_lands.pdf" cause either the file type isnt supported or the file is damaged (e.g. the file has been sent as an email attachment and wasnt decoded correctly)

FlateDecoding the embedded stream reproduces mail_lands.pdf, no error, no damage.

Looking at mail_lands.pdf with an editor, all I see on the first glance is: It's PDF-1.4, WeasyPrint produces PDF-1.3, maybe thats the point? Embedding 1.4 in 1.3 upsets the PDF readers?
Just a guess.

tjrhodes · 2018-06-23T12:18:32Z

Right, interesting, that pdf and lots of others I'm attaching to the weasy generated ones with pdfunite, are created from libreoffice --convert-to on the command line. So I guess I need to look for a way to control the version there. I don't have adobe reader, gnome document viewer shows me the weasy generated content and nothing else.

Anyway, thanks for the hint I didn't know where to look for an answer but the 1.4 > 1.3 thing looks very promising, and thanks for the great tool, the results from html are fantastic.

Tontyna · 2018-06-23T13:22:31Z

What I dont get is why a viewer, capable of handling PDF 1.4 isn't able to unpack the attached PDF and detect that he should simply switch his parsing engine from 1.3 to 1.4... but thats what at least 3 viewers seemingly fail to do: Adobe Reader, Sumatra PDF viewer, gnome document viewer.

Would you mind changing the title to sth like "problems when embedding PDF 1.4 files"?

liZe · 2018-06-25T16:53:11Z

Version 1.3 is set by pdfrw, but Cairo creates PDF files with version 1.5. How does pdfrw transforms 1.5 documents into 1.3? I don't know, and I don't want to.

After many, many bugs (#644, #639, #565 and equivalent issues), I think that we should not use pdfrw anymore. Cairo now provides an API to add metadata (including bookmarks I think), there's not much more to handle by editing the PDF file (at least bleeding areas).

I'm sad, because pdfrw is really useful and its devs are really nice. But the work needed on CairoSVG and on WeasyPrint is probably less than the work needed to understand and fix these issues using pdfrw.

Tontyna · 2018-06-25T21:31:50Z

Wanted to be shure that the mixed PDF versions are the source of evil.
Attached the 1.4 PDF mail_lands.pdf to the SEMILIBERTA'_GUBBIOTTI_PAOLO_17_6_2018.html, provided by @tjrhodes, rendered that with WeasyPrint to a PDF 1.3, opened it in my PDF viewer.
And guess what?

No problem. No error opening the attached file.

Conclusion: Its not a PDF version conflict. Something must be wrong with the embedded encoded stream.

And indeed: The so called FlateDecoded stream in my working PDF looks completely different than the streams contained in @tjrhodes' PDFs.

Working embedded stream:

stream
xœt·cl.^Ðî]w×¶mÛ¶ív×¶mÜµí]Û¶mÛ
[...]
endstream

Failing stream looks like a Pythonic binary string to me:

stream
b'x\x9ct\xb7cl.^\xd0\xee]w\xd7\xb6m\xdb\[...]'
endstream

Was unable to find the place where pdfrw writes the attachment's stream into the PDF and check whether it converts it into the wrong string type.
Instead I switched focus to zlib and now I'm quite sure that it's' not pdfrw, but zlib who creates bytes instead of str.

The stream is actually produced by zlib.compressobj().compress().decode()
My zlib.ZLIB_VERSION is 1.2.11 and my zlib.compressobj().compress().decode() returns a str.
When I forcible convert that str to bytes I can reproduce @tjrhodes bug, the embedded stream is a Pythonic binary string and the attached file not accessible.

The workaround consequently looks like that:

        if isinstance(pdf_file_object.stream, bytes):
            pdf_file_object.stream = str(pdf_file_object.stream)

Will create a PR asap.

Tontyna · 2018-06-25T22:11:17Z

@tjrhodes -- since I cant reproduce the bug: Would you please test whether the PR fixes it?

Edit: No need to test, see the ff comments.

liZe · 2018-06-26T09:19:19Z

@Tontyna thanks a lot for your investigation! You're right, it's an encoding problem when using Python 3. It's not pdfrw's fault … but we wouldn't have this issue without pdfrw.

This issue is actually a duplicate of #558, fixed in ce84073.

I've backported ce84073 into the 0.x branch (and fixed Python 2 support, as ce84073 was a Python3-only fix).

liZe · 2018-06-26T09:21:39Z

I'll release 0.42.4 during the summer 🌞.

Tontyna mentioned this issue Jun 25, 2018

Prevent embedded file streams from being bytes #650

Closed

liZe closed this as completed Jun 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I can attach files via command line api but not the python api? #644

I can attach files via command line api but not the python api? #644

tjrhodes commented Jun 16, 2018 •

edited

Loading

tjrhodes commented Jun 16, 2018

tjrhodes commented Jun 17, 2018

Tontyna commented Jun 17, 2018

tjrhodes commented Jun 19, 2018

liZe commented Jun 23, 2018

Tontyna commented Jun 23, 2018

tjrhodes commented Jun 23, 2018

Tontyna commented Jun 23, 2018

tjrhodes commented Jun 23, 2018

Tontyna commented Jun 23, 2018

liZe commented Jun 25, 2018

Tontyna commented Jun 25, 2018 •

edited

Loading

Tontyna commented Jun 25, 2018 •

edited

Loading

liZe commented Jun 26, 2018

liZe commented Jun 26, 2018

I can attach files via command line api but not the python api? #644

I can attach files via command line api but not the python api? #644

Comments

tjrhodes commented Jun 16, 2018 • edited Loading

tjrhodes commented Jun 16, 2018

tjrhodes commented Jun 17, 2018

Tontyna commented Jun 17, 2018

tjrhodes commented Jun 19, 2018

liZe commented Jun 23, 2018

Tontyna commented Jun 23, 2018

tjrhodes commented Jun 23, 2018

Tontyna commented Jun 23, 2018

tjrhodes commented Jun 23, 2018

Tontyna commented Jun 23, 2018

liZe commented Jun 25, 2018

Tontyna commented Jun 25, 2018 • edited Loading

Tontyna commented Jun 25, 2018 • edited Loading

liZe commented Jun 26, 2018

liZe commented Jun 26, 2018

tjrhodes commented Jun 16, 2018 •

edited

Loading

Tontyna commented Jun 25, 2018 •

edited

Loading

Tontyna commented Jun 25, 2018 •

edited

Loading