Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML parsing segfaults when tags are closed with <?tag> and the document is encoded in ASCII-8BIT #845

Closed
barelyknown opened this issue Feb 6, 2013 · 6 comments

Comments

@barelyknown
Copy link

On ruby 1.9.3p286 (2012-10-12 revision 37165) [x86_64-darwin12.2.0]

Nokogiri::HTML("<strong>this will segfault<?strong>".force_encoding("ASCII-8BIT"))

Nokogiri::HTML("<strong>this will not segfault<?strong>".force_encoding("UTF-8"))

cc @mgates

@ender672
Copy link
Member

ender672 commented Feb 6, 2013

When given an ASCII-8BIT string, Nokogiri invokes a SAX push parser to sniff out the document encoding.

You uncovered a segfault in the Nokogiri push parser -- it doesn't properly handle processing instructions with no content.

From ext/nokogiri/xml_sax_parser.c:240

static void processing_instruction(void * ctx, const xmlChar * name, const xmlChar * content)
{
  VALUE self = NOKOGIRI_SAX_SELF(ctx);
  VALUE doc = rb_iv_get(self, "@document");

  rb_funcall( doc,
              id_processing_instruction,
              2,
              NOKOGIRI_STR_NEW2(name),
              NOKOGIRI_STR_NEW2(content)
  );
}

The above will fail if libxml2 calls the function with NULL for content. It does exactly that in HTMLparser.c:3142

ctxt->sax->processingInstruction(ctxt->userData,
      target, NULL);

@barelyknown
Copy link
Author

Glad that you found the issue. We spent some time to isolate the bug, but didn't know enough about Nokogiri to dig into the details to figure out why it was segfaulting.

@ender672
Copy link
Member

ender672 commented Feb 6, 2013

Thanks for taking the time to isolate the bug! That effort made it loads easier to find and fix it.

@barelyknown
Copy link
Author

Happy to have helped. Thanks for fixing.

On Wed, Feb 6, 2013 at 1:53 PM, Timothy Elliott [email protected]:

Thanks for taking the time to isolate the bug! That effort made it loads
easier to find and fix it.


Reply to this email directly or view it on GitHubhttps://github.com//issues/845#issuecomment-13201581.

@Flaburgan
Copy link

Looks like there still is a problem with push_parser, I had the exact same error: diaspora/diaspora#4996

@flavorjones
Copy link
Member

Can you please open a new issue with code that will allow us to reproduce what you're experiencing?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants