-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML parsing segfaults when tags are closed with <?tag> and the document is encoded in ASCII-8BIT #845
Comments
When given an ASCII-8BIT string, Nokogiri invokes a SAX push parser to sniff out the document encoding. You uncovered a segfault in the Nokogiri push parser -- it doesn't properly handle processing instructions with no content. From ext/nokogiri/xml_sax_parser.c:240 static void processing_instruction(void * ctx, const xmlChar * name, const xmlChar * content)
{
VALUE self = NOKOGIRI_SAX_SELF(ctx);
VALUE doc = rb_iv_get(self, "@document");
rb_funcall( doc,
id_processing_instruction,
2,
NOKOGIRI_STR_NEW2(name),
NOKOGIRI_STR_NEW2(content)
);
} The above will fail if libxml2 calls the function with NULL for content. It does exactly that in HTMLparser.c:3142 ctxt->sax->processingInstruction(ctxt->userData,
target, NULL); |
Glad that you found the issue. We spent some time to isolate the bug, but didn't know enough about Nokogiri to dig into the details to figure out why it was segfaulting. |
Thanks for taking the time to isolate the bug! That effort made it loads easier to find and fix it. |
Happy to have helped. Thanks for fixing. On Wed, Feb 6, 2013 at 1:53 PM, Timothy Elliott [email protected]:
|
Looks like there still is a problem with push_parser, I had the exact same error: diaspora/diaspora#4996 |
Can you please open a new issue with code that will allow us to reproduce what you're experiencing? Thank you. |
On ruby 1.9.3p286 (2012-10-12 revision 37165) [x86_64-darwin12.2.0]
cc @mgates
The text was updated successfully, but these errors were encountered: