Skip to content

Commit

Permalink
Add position check for XML declaration
Browse files Browse the repository at this point in the history
## Why?
XML declaration must be the first item.

https://www.w3.org/TR/2006/REC-xml11-20060816/#document

```
[1]   document   ::=   ( prolog element Misc* ) - ( Char* RestrictedChar Char* )
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog

```
[22]   prolog   ::=   	XMLDecl Misc* (doctypedecl Misc*)?
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl

```
[23]   XMLDecl  ::=   '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
```

See: ruby#161 (comment)
  • Loading branch information
naitoh committed Jul 6, 2024
1 parent face9dd commit 34ea778
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 14 deletions.
32 changes: 18 additions & 14 deletions lib/rexml/parsers/baseparser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -635,21 +635,25 @@ def process_instruction(start_position)
@source.position = start_position
raise REXML::ParseException.new(message, @source)
end
if @document_status.nil? and match_data[1] == "xml"
content = match_data[2]
version = VERSION.match(content)
version = version[1] unless version.nil?
encoding = ENCODING.match(content)
encoding = encoding[1] unless encoding.nil?
if need_source_encoding_update?(encoding)
@source.encoding = encoding
end
if encoding.nil? and /\AUTF-16(?:BE|LE)\z/i =~ @source.encoding
encoding = "UTF-16"
if match_data[1] == "xml"
if @document_status
raise ParseException.new("Malformed XML: XML declaration other than at the top of the document.", @source)
else
content = match_data[2]
version = VERSION.match(content)
version = version[1] unless version.nil?
encoding = ENCODING.match(content)
encoding = encoding[1] unless encoding.nil?
if need_source_encoding_update?(encoding)
@source.encoding = encoding
end
if encoding.nil? and /\AUTF-16(?:BE|LE)\z/i =~ @source.encoding
encoding = "UTF-16"
end
standalone = STANDALONE.match(content)
standalone = standalone[1] unless standalone.nil?
return [ :xmldecl, version, encoding, standalone ]
end
standalone = STANDALONE.match(content)
standalone = standalone[1] unless standalone.nil?
return [ :xmldecl, version, encoding, standalone ]
end
[:processing_instruction, match_data[1], match_data[2]]
end
Expand Down
17 changes: 17 additions & 0 deletions test/parse/test_processing_instruction.rb
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,23 @@ def test_garbage_text
pi.content,
])
end

def test_xml_declaration_not_at_document_start
exception = assert_raise(REXML::ParseException) do
parser = REXML::Parsers::BaseParser.new('<a><?xml version="1.0" ?></a>')
while parser.has_next?
parser.pull
end
end

assert_equal(<<~DETAIL.chomp, exception.to_s)
Malformed XML: XML declaration other than at the top of the document.
Line: 1
Position: 25
Last 80 unconsumed characters:
DETAIL
end
end
end
end

0 comments on commit 34ea778

Please sign in to comment.