Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add replace_entities to the XML PushParser #1017

Closed
wants to merge 4 commits into from

Conversation

spraints
Copy link
Contributor

795cc5a resolved #76 for Nokogiri::XML::SAX::Parser. This branch adds the same option to Nokogiri::XML::SAX::PushParser.

For example:

require 'nokogiri'
include Nokogiri::XML

def main
  before
  after
end

def before
  puts '-'*40, 'before:'
  parser = SAX::PushParser.new(Dest.new)
  parser << doc
  parser.finish
end

def after
  puts '-'*40, 'after:'
  parser = SAX::PushParser.new(Dest.new)
  parser.replace_entities = true
  parser << doc
  parser.finish
end

def doc
  '<e a="x &amp; y">p &amp; q</e>'
end

class Dest < SAX::Document
  def start_element(name, attrs)
    p attrs
  end
  def characters(text)
    p text
  end
end

main

Output:

----------------------------------------
before:
[["a", "x &#38; y"]]
"p "
"&"
" q"
----------------------------------------
after:
[["a", "x & y"]]
"p "
"&"
" q"

@flavorjones flavorjones added this to the 1.7.0 milestone Feb 17, 2016
@flavorjones
Copy link
Member

Tentatively targetting this change for 1.7.0.

@rosenfeld
Copy link
Contributor

I don't understand. With replace_entities set to false, which I believe is the default, no substitutions should be made, right? So, I'd expect before to print &amp; rather than &#38;. Isn't this a bug? I'm particularly interested on being able to get the raw value without any substitutions, which I think should be libxml2 default behavior, right?

@spraints
Copy link
Contributor Author

With replace_entities set to false, which I believe is the default, no substitutions should be made, right? So, I'd expect before to print & rather than &. Isn't this a bug?

It might be, but that's not the goal of this PR. This branch lets me tell libxml2 to replace &amp; (or &#38;) with &.

@flavorjones flavorjones modified the milestones: 1.7.1, 2.0.0 Jan 15, 2017
@flavorjones
Copy link
Member

ZOMG, I've merged this in. Thank you for your patience, and apologies this took so long to get any attention.

@spraints
Copy link
Contributor Author

💖 thanks!

@spraints spraints deleted the push-parser-ctx branch January 26, 2017 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem with & character and SaxPushParser (Maybe SaxParser as well..)
3 participants