Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image missing at start of document #65

Closed
Smylers opened this issue Mar 26, 2013 · 1 comment
Closed

Image missing at start of document #65

Smylers opened this issue Mar 26, 2013 · 1 comment

Comments

@Smylers
Copy link
Contributor

Smylers commented Mar 26, 2013

This document starts with an image: http://www.stripey.com/demo/weasyprint/missing_image.html

But WeasyPrint doesn't show it: http://www.stripey.com/demo/weasyprint/missing_image.pdf

It can be made to appear by doing any of these:

  • putting some text, such as ‘X’, before the image
  • explicitly including the optional <body> tag
  • wrapping the <img> in a block element, such as <div>

But not by:

  • wrapping the <img> in an inline element, such as <span>
  • putting an ‘X’ before it and wrapping both the ‘X’ and the <img> in a <span>
@SimonSapin
Copy link
Member

This is a bug in libxml2’s HTML parser (the one used by lxml.html), which does not yet conform to HTML5 parsing. AFAIK this was just unspecified in HTML4.

>>> print(lxml.html.tostring(lxml.html.parse('http://www.stripey.com/demo/weasyprint/missing_image.html')))
<!DOCTYPE html>
<html><head><title>Missing Image</title><img src="200px-Donkey_cartoon_04.svg.png" alt="[an arbitrary image]"></head><body><p>There should be
<a href="http://commons.wikimedia.org/wiki/File:Donkey_cartoon_04.svg">a cartoon
donkey</a> above this paragraph.
</p></body></html>

As you can see, the parser adds the implied <head> and <body> elements (as expected) but in some cases considers the image to be part of the former instead of the latter.

See #12 about using the html5lib parser instead, which does not have this issue but is tricky to use at this point because of broken namespace handling in cssselect.

>>> print(lxml.html.tostring(lxml.html.html5parser.parse('http://www.stripey.com/demo/weasyprint/missing_image.html')))
<!DOCTYPE html>
<html:html xmlns:html="http://www.w3.org/1999/xhtml"><html:head><html:title>Missing Image</html:title>

</html:head><html:body><html:img src="200px-Donkey_cartoon_04.svg.png" alt="[an arbitrary image]"></html:img>

<html:p>There should be
<html:a href="http://commons.wikimedia.org/wiki/File:Donkey_cartoon_04.svg">a cartoon
donkey</html:a> above this paragraph.
</html:p></html:body></html:html>

If using html5lib is impractical, I recommend adding an explicit <body> tag. Alternatively, try to ask libxml2 for fixing this in their parser.

Closing. #12 is the one to follow for html5lib support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants