Skip to content

Commit

Permalink
Prohibit surrogate code units (#290)
Browse files Browse the repository at this point in the history
Surrogate code units (U+D800 through U+DBFF) cannot be encoded into UTF-8.

Ref #268
  • Loading branch information
gibson042 authored and echeran committed Sep 20, 2022
1 parent c2f8efe commit ff08e9a
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion spec/message.ebnf
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Markup ::= MarkupStart Option*
/* Text */
Text ::= (TextChar | TextEscape)+
TextChar ::= AnyChar - ('{' | '}' | Esc)
AnyChar ::= [#x0-#x10FFFF]
AnyChar ::= [#x0-#x10FFFF] - [#xD800-#xDBFF]

/* Names */
Variable ::= '$' Name /* ws: explicit */
Expand Down
8 changes: 4 additions & 4 deletions spec/syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -387,7 +387,7 @@ and `\` (which starts an escape sequence).
```ebnf
Text ::= (TextChar | TextEscape)+ /* ws: explicit */
TextChar ::= AnyChar - ('{' | '}' | Esc)
AnyChar ::= [#x0-#x10FFFF]
AnyChar ::= [#x0-#x10FFFF] - [#xD800-#xDBFF]
```

### Names
Expand Down Expand Up @@ -424,13 +424,13 @@ NameChar ::= NameStart | [0-9] | "-" | "." | #xB7
### Literal

Any Unicode code point is allowed in literals,
with the exception of its delimiters `(` and `)`,
and `\` (which starts an escape sequence).
with the exception of `(` and `)` (which delimit literals),
`\` (which starts an escape sequence), and
surrogate code points U+D800 through U+DBFF (which cannot be encoded into UTF-8).

This includes line-breaking characters (such as U+000A LINE FEED and U+000D CARRIAGE RETURN),
other control characters (such as U+0000 NULL and U+0009 TAB),
permanently reserved noncharacters (U+FDD0 through U+FDEF and U+<i>n</i>FFFE and U+<i>n</i>FFFF where <i>n</i> is 0x0 through 0x10),
surrogate code points (U+D800 through U+DBFF),
private-use code points (U+E000 through U+F8FF, U+F0000 through U+FFFFD, and U+100000 through U+10FFFD),
and unassigned code points.

Expand Down

0 comments on commit ff08e9a

Please sign in to comment.