-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replaced character stack with string buffer #1676
Replaced character stack with string buffer #1676
Conversation
I will fix ruby-1.9.3 tomorrow. |
Hi @andrew-aladev, thanks for submitting this. I love the additional test coverage! @knu and I have asked a couple of the core contributors who know the jruby implementation a bit better than us to take a look. And yes, if you avoid the |
Also pinging @kares for an opinion. |
looking good, just get CI 💚 and its perfect :) |
I've replaced |
Merged! Thank you for your contribution! |
I've found an inconsistency between
c
andjava
version of html and xml sax parsers.Let it be an html:
We can expect that our sax parser will receive the following information:
c
version works perfect butjava
does not. I've found acharacterStack
inext/java/nokogiri/internals/NokogiriHandler.java
that is actually a bug. I couldn't understand the reason of using stack for characters.I was thinking for a long time about this stack and I've found the only reason for it: we will receive a list of character strings anyway when xml or html syntax is broken at the end of the document. But nokogiri has absolutely another methods of handling broken syntax. I think nokogiri should not keep this stack in the code.
So I've removed characters stack and added simple string buffer. The fix itself is very small.
Than I've added a special document to the test helper in order to test the strict order of parsed items. I've implemented 2 tests with regular text and whitespace for html and xml sax parsers.
Text nodes are regular text and comments for html parser. Xml test received a bonus: cdata blocks.