Skip to content

Commit

Permalink
MM-13248
Browse files Browse the repository at this point in the history
Resolve issue with encoding extraction regex. Bumped version number to prepare for follow up release.

While here also noticed that a standalone HTML::TreeBuilder was instantiated within this method. This is now
replaced with a CSS::Inliner::TreeBuilder, and configured identically to the other instances within the class
for consistency purposes.
  • Loading branch information
kamelkev committed Dec 17, 2015
1 parent e7a1270 commit b136b82
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 8 deletions.
6 changes: 6 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -231,3 +231,9 @@
* Update MANIFEST to reference all added tests/assets
* Fix minor formatting issues within some tests/assets
* Address concerns raised by CPAN RT96414, conditionally test for connectivity instead of outright failing

4003 2015-12-16 Kevin Kamel <[email protected]>
* Resolve charset sniffing issue
- invalid charset present within the document would cause charset sniffing to end prematurely
- invalid charset present within the document would cause Inliner to die during the decode phase
* Resolve issue whereby a TreeBuilder instance was not configured as expected
20 changes: 12 additions & 8 deletions lib/CSS/Inliner.pm
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ package CSS::Inliner;
use strict;
use warnings;

our $VERSION = '4002';
our $VERSION = '4003';

use Carp;
use Encode;
Expand Down Expand Up @@ -964,10 +964,12 @@ sub _extract_meta_charset {
local $SIG{__WARN__} = sub { my $warning = shift; warn $warning unless $warning =~ /^Parsing of undecoded UTF-8/ };

# parse document and pull out key header elements
my $doc = HTML::TreeBuilder->new();
$doc->parse_content($$params{content});
my $extract_tree = new CSS::Inliner::TreeBuilder();
$self->_configure_tree({ tree => $extract_tree });

my $head = $doc->look_down("_tag", "head"); # there should only be one
$extract_tree->parse_content($$params{content});

my $head = $extract_tree->look_down("_tag", "head"); # there should only be one

my $meta_charset;
if ($head) {
Expand All @@ -979,12 +981,14 @@ sub _extract_meta_charset {
if ($meta_equiv_charset_elem) {
my $meta_equiv_content = $meta_equiv_charset_elem->attr('content');

if ($meta_equiv_content =~ /charset=(.*)(?:[";,]?)/i) {
$meta_charset = $1;
# leverage charset allowable chars from https://tools.ietf.org/html/rfc2978
if ($meta_equiv_content =~ /charset(?:\s*)=(?:\s*)([\w!#$%&'\-+^`{}~]+)/i) {
$meta_charset = find_encoding($1);
}
}
elsif ($meta_charset_elem) {
$meta_charset = $meta_charset_elem->attr('charset');

if (!defined($meta_charset) && $meta_charset_elem) {
$meta_charset = find_encoding($meta_charset_elem->attr('charset'));
}
}

Expand Down

0 comments on commit b136b82

Please sign in to comment.