Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a regexp to guess HTML #932

Merged
merged 1 commit into from
Aug 8, 2013
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion lib/nokogiri.rb
Original file line number Diff line number Diff line change
@@ -67,7 +67,7 @@ class << self
def parse string, url = nil, encoding = nil, options = nil
doc =
if string.respond_to?(:read) ||
string =~ /^\s*<[^Hh>]*html/i # Probably html
string =~ /^\s*<(?:!DOCTYPE\s+)?html[\s>]/i # Probably html
Nokogiri.HTML(
string,
url,
344 changes: 344 additions & 0 deletions test/files/atom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,344 @@
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" xml:lang="en-US">
<id>tag:github.com,2008:/sparklemotion/nokogiri/commits/master</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commits/master"/>
<link type="application/atom+xml" rel="self" href="https://github.com/sparklemotion/nokogiri/commits/master.atom"/>
<title>Recent Commits to nokogiri:master</title>
<updated>2013-07-02T08:17:16-07:00</updated>
<entry>
<id>tag:github.com,2008:Grit::Commit/233489ddfe4698b24dfcdc1daf3ca15bc880e4b1</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/233489ddfe4698b24dfcdc1daf3ca15bc880e4b1"/>
<title>
Updated changelog with merged pull requests #887 and #931
</title>
<updated>2013-07-02T08:17:16-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/9077af710bdadc972a7898e457bf6ec1?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>injekt</name>
<uri>https://github.com/injekt</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Updated changelog with merged pull requests #887 and #931&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/15eff8be4539a3461cfc1e3c8a257e7eed134e01</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/15eff8be4539a3461cfc1e3c8a257e7eed134e01"/>
<title>
Merge pull request #887 from Mange/double-not
</title>
<updated>2013-07-02T08:13:36-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/9077af710bdadc972a7898e457bf6ec1?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>injekt</name>
<uri>https://github.com/injekt</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Merge pull request #887 from Mange/double-not

Add support for bare and multiple :not() functions in selectors&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/7a54b1fd1ba1dbcd50d3566bcc62d554eb0fd30e</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/7a54b1fd1ba1dbcd50d3566bcc62d554eb0fd30e"/>
<title>
Merge pull request #931 from sorah/dont_call_pkgconfig_when_using_bundled_libraries
</title>
<updated>2013-07-02T08:11:37-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/74f896b312b786ee75a18073941e2457?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>knu</name>
<uri>https://github.com/knu</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Merge pull request #931 from sorah/dont_call_pkgconfig_when_using_bundled_libraries

extconf.rb: Don&#39;t call pkg_config when using bundled libraries&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/e5b93617ba810a34a4fc70dac76a83f9bebd2ea1</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/e5b93617ba810a34a4fc70dac76a83f9bebd2ea1"/>
<title>
extconf.rb: Don&#39;t call pkg_config when using bundled libraries
</title>
<updated>2013-07-02T06:30:00-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/c4f076f658dd3464f1d8785ad53a0d99?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>sorah</name>
<uri>https://github.com/sorah</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>extconf.rb: Don&#39;t call pkg_config when using bundled libraries&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/64cd3c8f506394b558bda9376844a51db971bd60</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/64cd3c8f506394b558bda9376844a51db971bd60"/>
<title>
Add support for bare and multiple :not() functions in selectors
</title>
<updated>2013-06-28T10:19:36-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/1b2792536d87aa21bc739c14980fa103?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>Mange</name>
<uri>https://github.com/Mange</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Add support for bare and multiple :not() functions in selectors

This enables parsing of &quot;.flash:not(.error):not(.warning)&quot; and
&quot;:not(p)&quot;.

This entailed a huge change to the parser, but I think the parser should
be a bit more robust now by not declaring :not a special case.&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/37828ec784faf29d713225fa78585854912356cd</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/37828ec784faf29d713225fa78585854912356cd"/>
<title>
Report error and stop parsing instead of silently ignoring it.
</title>
<updated>2013-06-27T18:46:12-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/2736d9750eb13425e9bf70f112753c49?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>jvshahid</name>
<uri>https://github.com/jvshahid</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Report error and stop parsing instead of silently ignoring it.&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/42da60e8ca58a5acc9e8c94ae5ad490589f146c6</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/42da60e8ca58a5acc9e8c94ae5ad490589f146c6"/>
<title>
Merge pull request #930 from ctborg/master
</title>
<updated>2013-06-27T04:23:33-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/9077af710bdadc972a7898e457bf6ec1?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>injekt</name>
<uri>https://github.com/injekt</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Merge pull request #930 from ctborg/master

Adds support for Solaris, OpenSolaris, or illumos.&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/7382a6e6a66d3f80da955ebbcd82250f99321900</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/7382a6e6a66d3f80da955ebbcd82250f99321900"/>
<title>
Updated CHANGELOG with latest merged pulls
</title>
<updated>2013-06-27T04:04:17-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/9077af710bdadc972a7898e457bf6ec1?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>injekt</name>
<uri>https://github.com/injekt</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Updated CHANGELOG with latest merged pulls&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/3382ebc90e17f2503f4acbbdf53a931d38f3322a</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/3382ebc90e17f2503f4acbbdf53a931d38f3322a"/>
<title>
Remove REE also.
</title>
<updated>2013-06-19T20:57:48-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/74f896b312b786ee75a18073941e2457?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>knu</name>
<uri>https://github.com/knu</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Remove REE also.&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/0c83f4027e295435680cda89a44b5e039e0e6cbb</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/0c83f4027e295435680cda89a44b5e039e0e6cbb"/>
<title>
Added support for Solaris, OpenSolaris, or illumos
</title>
<updated>2013-06-19T07:12:50-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/aff9165c0ea741d027799d5694d0ceb8?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name></name>
<email>[email protected]</email>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Added support for Solaris, OpenSolaris, or illumos&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/fb000c51e3ba4e8730a535d38d617d34111c21ef</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/fb000c51e3ba4e8730a535d38d617d34111c21ef"/>
<title>
Ensure meta_encoding checks meta charset tag (closes #919)
</title>
<updated>2013-06-14T06:06:26-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/9077af710bdadc972a7898e457bf6ec1?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>injekt</name>
<uri>https://github.com/injekt</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Ensure meta_encoding checks meta charset tag (closes #919)&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/14b7dc21bbeddec1990726760e6f6cc11e71542e</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/14b7dc21bbeddec1990726760e6f6cc11e71542e"/>
<title>
Updated ROADMAP
</title>
<updated>2013-06-14T06:00:57-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/9077af710bdadc972a7898e457bf6ec1?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>injekt</name>
<uri>https://github.com/injekt</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Updated ROADMAP&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/d1b0afd02f8a2b783380bdfde3acfabf021a0c5a</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/d1b0afd02f8a2b783380bdfde3acfabf021a0c5a"/>
<title>
Merge pull request #858 from ykzts/feature/not-only-child
</title>
<updated>2013-06-14T05:53:28-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/9077af710bdadc972a7898e457bf6ec1?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>injekt</name>
<uri>https://github.com/injekt</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Merge pull request #858 from ykzts/feature/not-only-child

Add feature that the &#39;:only-child&#39; pseudo class should work even though it&#39;s in &#39;:not&#39; pseudo class.&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/d24eb8508c69f2cf336f1ff1b6887fcc3b317fbc</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/d24eb8508c69f2cf336f1ff1b6887fcc3b317fbc"/>
<title>
Merge pull request #886 from Mange/negative-nth
</title>
<updated>2013-06-14T05:50:13-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/9077af710bdadc972a7898e457bf6ec1?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>injekt</name>
<uri>https://github.com/injekt</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Merge pull request #886 from Mange/negative-nth

Add support for an-b in nth selectors&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/11eaf3d3f1b84adf5225b6686119dbc6714b4009</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/11eaf3d3f1b84adf5225b6686119dbc6714b4009"/>
<title>
Note 1.8 deprecation in README
</title>
<updated>2013-06-11T06:48:53-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/bd6e7860b5a891bff077aeaeb5434e60?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>flavorjones</name>
<uri>https://github.com/flavorjones</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Note 1.8 deprecation in README&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/82a936a0ceba0508d29cc9f6da84c5a98ae7bede</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/82a936a0ceba0508d29cc9f6da84c5a98ae7bede"/>
<title>
Fix a typo and DRY with dir_config() calls.
</title>
<updated>2013-06-10T21:01:03-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/74f896b312b786ee75a18073941e2457?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>knu</name>
<uri>https://github.com/knu</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Fix a typo and DRY with dir_config() calls.&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/c7ef9b339054f8502e7f526b1b8054e3412cf7ba</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/c7ef9b339054f8502e7f526b1b8054e3412cf7ba"/>
<title>
Iconv was for building libxml2 and not for nokogiri itself.
</title>
<updated>2013-06-10T17:55:20-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/74f896b312b786ee75a18073941e2457?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>knu</name>
<uri>https://github.com/knu</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Iconv was for building libxml2 and not for nokogiri itself.&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/4c1c225b9409571156430aeb5bcf89c4f9dd4812</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/4c1c225b9409571156430aeb5bcf89c4f9dd4812"/>
<title>
Make sure the MRI gem is built correctly.
</title>
<updated>2013-06-10T07:40:39-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/bd6e7860b5a891bff077aeaeb5434e60?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>flavorjones</name>
<uri>https://github.com/flavorjones</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Make sure the MRI gem is built correctly.&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/b6100afd7a0ce17097fb3ca58ae8fe5e63522f6d</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/b6100afd7a0ce17097fb3ca58ae8fe5e63522f6d"/>
<title>
Fiddling with the windows cross-compile gem logic.
</title>
<updated>2013-06-10T07:33:59-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/bd6e7860b5a891bff077aeaeb5434e60?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>flavorjones</name>
<uri>https://github.com/flavorjones</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>Fiddling with the windows cross-compile gem logic.&lt;/pre>
</content>
</entry>
<entry>
<id>tag:github.com,2008:Grit::Commit/b1483de363524c882b5e682a3d3a6e6ab5d7511b</id>
<link type="text/html" rel="alternate" href="https://github.com/sparklemotion/nokogiri/commit/b1483de363524c882b5e682a3d3a6e6ab5d7511b"/>
<title>
build_all now uses ruby 1.9.3 to build everything.
</title>
<updated>2013-06-10T07:33:56-07:00</updated>
<media:thumbnail height="30" width="30" url="https://secure.gravatar.com/avatar/bd6e7860b5a891bff077aeaeb5434e60?s=30&amp;d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"/>
<author>
<name>flavorjones</name>
<uri>https://github.com/flavorjones</uri>
</author>
<content type="html">
&lt;pre style='white-space:pre-wrap;width:81ex'>build_all now uses ruby 1.9.3 to build everything.&lt;/pre>
</content>
</entry>
</feed>
1 change: 1 addition & 0 deletions test/helper.rb
Original file line number Diff line number Diff line change
@@ -30,6 +30,7 @@ class TestCase < MiniTest::Spec
SNUGGLES_FILE = File.join(ASSETS_DIR, 'snuggles.xml')
XML_FILE = File.join(ASSETS_DIR, 'staff.xml')
XML_XINCLUDE_FILE = File.join(ASSETS_DIR, 'xinclude.xml')
XML_ATOM_FILE = File.join(ASSETS_DIR, 'atom.xml')
XSLT_FILE = File.join(ASSETS_DIR, 'staff.xslt')

def teardown
6 changes: 6 additions & 0 deletions test/test_nokogiri.rb
Original file line number Diff line number Diff line change
@@ -41,6 +41,12 @@ def test_xml?
assert !doc.html?
end

def test_atom_is_xml?
doc = Nokogiri.parse(File.read(XML_ATOM_FILE))
assert doc.xml?
assert !doc.html?
end

def test_html?
doc = Nokogiri.parse(File.read(HTML_FILE))
assert !doc.xml?