Support short meta charset tag #15

tzi · 2012-11-06T01:49:56Z

Hi !

Unless I am mistaken, I think weasyprint des not support the short meta charset tag:

<meta charset="utf-8">

But works fine with the complete one:

<meta http-equiv="Content-Type" content="Type=text/html; charset=utf-8">

Most of html5 website use the short one.
Because it is simpler and I think enought according to the specification.

Thanks for sharing weasyprint,
Thomas.

The text was updated successfully, but these errors were encountered:

SimonSapin · 2012-11-06T13:02:04Z

Hi,

Thank you for this report. WeasyPrint just uses libxml2 (through lxml) to parse HTML. The handling of <meta> elements to detect character encoding is there. The good news is, this bug is fixed in version 2.8.0 of libxml2:

http://xmlsoft.org/news.html

Add HTML parser support for HTML5 meta charset encoding declaration (Denis Pauk)

If you can upgrade libxml2 on your system, it should just work.

If you can not upgrade for some reason, another option is to use the html5lib parser. You should be able to do so with the git version of WeasyPrint and the workaround in #12 (comment)

tzi · 2012-11-06T15:10:36Z

Thanks for your answer !

SimonSapin closed this as completed Nov 6, 2012

tjdett mentioned this issue Jan 17, 2014

Ensure output HTML is UTF-8 uq-eresearch/aorra#122

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support short meta charset tag #15

Support short meta charset tag #15

tzi commented Nov 6, 2012

SimonSapin commented Nov 6, 2012

tzi commented Nov 6, 2012

Support short meta charset tag #15

Support short meta charset tag #15

Comments

tzi commented Nov 6, 2012

SimonSapin commented Nov 6, 2012

tzi commented Nov 6, 2012