The spec says "code point" where it likely means "unicode scalar value" #207
Labels
breaking
This can only be done for the next major version of KDL
documentation
Improvements or additions to documentation
enhancement
New feature or request
From the Unicode Glossary:
The first requirement of the spec is that "KDL documents should be UTF-8". Similarly, strings "MUST be represented as UTF-8 values." Note that a well-formed UTF-8 stream MUST NOT contain surrogate code points; it encodes a sequence of USV.
However, the formal grammar just says
unicode
(presumably, either[\0-\u{10FFFF}]
or[\0-\u{D7FF}\u{E000}-\u{10FFFF}]
— unclear (#191, #192)), and the prose spec allows "literal code points" and escaped "Code point described by hex characters". This allows the presence of surrogates, at a minimum as escaped code points, even if their literal inclusion is precluded by the higher-level requirement that the document be well-formed UTF-8.At least one implementation documents this as a potential spec non-compliance. Given the requirement of UTF-8, I expect that this is just a terminology oversight, and the spec should say Unicode Scalar Value (or non-surrogate code point) in the two locations where it currently just says "code point".
The text was updated successfully, but these errors were encountered: