Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a regexp to guess HTML #932

Merged
merged 1 commit into from
Aug 8, 2013
Merged

Fix a regexp to guess HTML #932

merged 1 commit into from
Aug 8, 2013

Conversation

ykzts
Copy link
Contributor

@ykzts ykzts commented Jul 3, 2013

1.9.1 :001 > Nokogiri.parse(open('https://github.com/sparklemotion/nokogiri/commits.atom').read).html?
 => true
1.9.1 :002 > Nokogiri.parse(open('https://github.com/sparklemotion/nokogiri/commits.atom').read).xml?
 => false

Atom is not HTML! Guess to HTML when string a "html" is included in a attribute-value.

@knu
Copy link
Member

knu commented Jul 4, 2013

I don't like the [^Hh>]* part in the first place.

I think the pattern should be more explicit as to what exactly fragments it's supposed to match, like /^\s*(?:<!DOCTYPE html[\s>]|<html[\s>])/.

@tenderlove Do you have any idea what the original intention of the current pattern was?

@ykzts
Copy link
Contributor Author

ykzts commented Jul 4, 2013

I don't like the [^Hh>]* part in the first place.

I also think so.

@knu
Copy link
Member

knu commented Aug 8, 2013

ping @tenderlove

@knu
Copy link
Member

knu commented Aug 8, 2013

@ykzts Could you update the regexp with the one I put above and rebase to get ready for a merge?

@ghost ghost assigned knu Aug 8, 2013
@ykzts
Copy link
Contributor Author

ykzts commented Aug 8, 2013

@knu done:heart:

incidentally...

A string that is an ASCII case-insensitive match for the string "<!DOCTYPE`".

cite: http://www.w3.org/TR/html5/syntax.html#the-doctype

@ykzts
Copy link
Contributor Author

ykzts commented Aug 8, 2013

regexp optimize.

@tenderlove
Copy link
Member

@knu I think it was just a guess. If the tests still pass, please feel free to change!

@knu
Copy link
Member

knu commented Aug 8, 2013

@ykzts Thanks. You can write DOCTYPE in upper case while you have an i flag. It's better for your eyes. I'll change so in the merge commit.

@tenderlove Okay! Thanks for confirmation!

@ykzts
Copy link
Contributor Author

ykzts commented Aug 8, 2013

@knu changed! check it out! 😋

knu added a commit that referenced this pull request Aug 8, 2013
@knu knu merged commit 4136da3 into sparklemotion:master Aug 8, 2013
knu added a commit that referenced this pull request Aug 8, 2013
@ykzts ykzts deleted the fix/regexp-to-guess-html branch June 11, 2017 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants