-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop adding extra new line after XML declaration with pretty format #164
Conversation
lib/rexml/formatters/pretty.rb
Outdated
@@ -111,7 +111,7 @@ def write_document( node, output ) | |||
# itself, then we don't need a carriage return... which makes this | |||
# logic more complex. | |||
node.children.each { |child| | |||
next if child == node.children[-1] and child.instance_of?(Text) | |||
next if child == "\n" and child.instance_of?(Text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Does this work with the following XML?
<?xml version="1.0" encoding="UTF-8"?>
<message>Hello world!</message>
It seems that we can ignore all top-level text nodes:
diff --git a/lib/rexml/formatters/pretty.rb b/lib/rexml/formatters/pretty.rb
index a1198b7..a838d83 100644
--- a/lib/rexml/formatters/pretty.rb
+++ b/lib/rexml/formatters/pretty.rb
@@ -111,7 +111,7 @@ module REXML
# itself, then we don't need a carriage return... which makes this
# logic more complex.
node.children.each { |child|
- next if child == node.children[-1] and child.instance_of?(Text)
+ next if child.instance_of?(Text)
unless child == node.children[0] or child.instance_of?(Text) or
(child == node.children[1] and !node.children[0].writethis)
output << "\n"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the right correction, too.
If next if child.instance_of?(Text)
, then the following XML works.
<?xml version="1.0" encoding="UTF-8"?>
<message>Hello world!</message>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@naitoh Interestingly if you have top-level text nodes that are not an ?xml
node, then you end up with the strange behavior below where it outputs a leading blank line.
XXX<message>Hello world!</message>
d = REXML::Document.new("XXX<message>Hello world!</message>")
d.to_s
# => "XXX<message>Hello world!</message>"
d.write(out = '', 2)
out
# => "\n<message>\n Hello world!\n</message>"
Note that I think this leading "XXX" is invalid XML anyway, but is parsed successfully with REXML (Nokogiri fails in STRICT mode with this). I wonder if you also need to fail on text nodes before the root node similar to how you did it in #161 for those after the root node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's open a new issue and discuss this on the new issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created #184 which fixes the above.
## Why? Fix REXML::Formatters::Pretty#write_document, which implicitly assumes that the XML file ends with a newline. If the XML file does not end with a newline, a space is added to the end of the first line. ``` Failure: test_indent(REXMLTests::TestDocument::WriteTest::ArgumentsTest) /Users/naitoh/ghq/github.com/naitoh/rexml/test/test_document.rb:270:in `test_indent' 267: output = "" 268: indent = 2 269: @document.write(output, indent) => 270: assert_equal(<<-EOX.chomp, output) 271: <?xml version='1.0' encoding='UTF-8'?> 272: <message> 273: Hello world! <"<?xml version='1.0' encoding='UTF-8'?>\n" + "<message>\n" + " Hello world!\n" + "</message>"> expected but was <"<?xml version='1.0' encoding='UTF-8'?> \n" + "<message>\n" + " Hello world!\n" + "</message>"> diff: ? <?xml version='1.0' encoding='UTF-8'?> <message> Hello world! </message> ```
df89ee2
to
01162bd
Compare
Likely a result of ruby/rexml#164
* Bump rexml to resolve security advisory changelog: Internal, Dependencies, Update dependencies to resolve security advisories * Add rexml as explicit dependency Since we use it in our code, it should be an explicit dependency See: https://github.com/18F/identity-idp/blob/ea8a6081961d6c373a870dd5fea31efce89fde7e/app/services/proofing/aamva/request/verification_request.rb#L60-L102 * Sync AAMVA fixture to expected output Likely a result of ruby/rexml#164
…fore root element (#184) ## Why? XML with content at the start of the document is invalid. https://www.w3.org/TR/2006/REC-xml11-20060816/#document ``` [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog ``` [22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)? ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl ``` [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Misc ``` [27] Misc ::= Comment | PI | S ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PI ``` [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>' ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PITarget ``` [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-doctypedecl ``` [28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('[' intSubset ']' S?)? '>' ``` See: #164 (comment)
If the XML file does not end with a newline, a space is added to the end of the first line.
This is happen because
REXML::Formatters::Pretty#write_document
has a logic that depends on the last text node.We should ignore all top-level text nodes with pretty format.