-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify whitespace and newline rules. #264
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -62,6 +62,7 @@ Spec | |
|
||
* TOML is case sensitive. | ||
* Whitespace means tab (0x09) or space (0x20). | ||
* Newline means LF (0x0A) or CRLF (0x0D0A). | ||
|
||
Comment | ||
------- | ||
|
@@ -116,26 +117,33 @@ purpose. | |
Sometimes you need to express passages of text (e.g. translation files) or would | ||
like to break up a very long string into multiple lines. TOML makes this easy. | ||
**Multi-line basic strings** are surrounded by three quotation marks on each | ||
side and allow newlines. If the first character after the opening delimiter is a | ||
newline (`0x0A`), then it is trimmed. All other whitespace remains intact. | ||
side and allow newlines. A newline immediately following the opening delimiter | ||
will be trimmed. All other whitespace and newline characters remain intact. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this wording is a little ambiguous. From my reading, I could imagine that Can we simply enumerate all cases in which characters are trimmed? So:
If
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
```toml | ||
# The following strings are byte-for-byte equivalent: | ||
key1 = "One\nTwo" | ||
key2 = """One\nTwo""" | ||
key3 = """ | ||
One | ||
Two""" | ||
key1 = """ | ||
Roses are red | ||
Violets are blue""" | ||
``` | ||
|
||
TOML parsers should feel free to normalize newline to whatever makes sense for | ||
their platform. | ||
|
||
```toml | ||
# On a Unix system, the above multi-line string will most likely be the same as: | ||
key2 = "Roses are red\nViolets are blue" | ||
|
||
# On a Windows system, it will most likely be equivalent to: | ||
key3 = "Roses are red\r\nViolets are blue" | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think The key is that we permit either To @ChristianSi's point, I'm not sure that we need to specify universal line handling in the spec. This lets the parser choose how to handle
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The latter sentence would address my concerns, yes. 👍 |
||
|
||
For writing long strings without introducing extraneous whitespace, end a line | ||
with a `\`. The `\` will be trimmed along with all whitespace (including | ||
newlines) up to the next non-whitespace character or closing delimiter. If the | ||
first two characters after the opening delimiter are a backslash and a newline | ||
(`0x5C0A`), then they will both be trimmed along with all whitespace (including | ||
newlines) up to the next non-whitespace character or closing delimiter. All of | ||
the escape sequences that are valid for basic strings are also valid for | ||
multi-line basic strings. | ||
first characters after the opening delimiter are a backslash and a newline, then | ||
they will both be trimmed along with all whitespace and newlines up to the next | ||
non-whitespace character or closing delimiter. All of the escape sequences that | ||
are valid for basic strings are also valid for multi-line basic strings. | ||
|
||
```toml | ||
# The following strings are byte-for-byte equivalent: | ||
|
@@ -177,9 +185,9 @@ Since there is no escaping, there is no way to write a single quote inside a | |
literal string enclosed by single quotes. Luckily, TOML supports a multi-line | ||
version of literal strings that solves this problem. **Multi-line literal | ||
strings** are surrounded by three single quotes on each side and allow newlines. | ||
Like literal strings, there is no escaping whatsoever. If the first character | ||
after the opening delimiter is a newline (`0x0A`), then it is trimmed. All other | ||
content between the delimiters is interpreted as-is without modification. | ||
Like literal strings, there is no escaping whatsoever. A newline immediately | ||
following the opening delimiter will be trimmed. All other content between the | ||
delimiters is interpreted as-is without modification. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
```toml | ||
regex2 = '''I [dw]on't need \d{2} apples''' | ||
|
@@ -306,22 +314,21 @@ apart from arrays because arrays are only ever values. | |
``` | ||
|
||
Under that, and until the next table or EOF are the key/values of that table. | ||
Keys are on the left of the equals sign and values are on the right. Keys start | ||
with the first character that isn't whitespace or `[` and end with the last | ||
non-whitespace character before the equals sign. Keys cannot contain a `#` | ||
character. Key/value pairs within tables are not guaranteed to be in any | ||
specific order. | ||
Keys are on the left of the equals sign and values are on the right. Whitespace | ||
is ignored around key names and values. | ||
|
||
Key names may only consist of non-whitespace, non-newline characters excluding | ||
`=`, `#`, `.`, `[`, and `]`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why did you delete these two lines? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indenting is covered by the previous statement that "Whitespace is ignored around key names and values.", so I thought it to be redundant. Also, the ability to indent is a weird way to segue into nested tables, and makes it sound as if indentation might carry some semantic value. |
||
|
||
Key/value pairs within tables are not guaranteed to be in any specific order. | ||
|
||
```toml | ||
[table] | ||
key = "value" | ||
``` | ||
|
||
You can indent keys and their values as much as you like. Tabs or spaces. Knock | ||
yourself out. Why, you ask? Because you can have nested tables. Snap. | ||
|
||
Nested tables are denoted by table names with dots in them. Name your tables | ||
whatever crap you please, just don't use `#`, `.`, `[` or `]`. | ||
Dots are prohibited in key names because dots are used to signify nested tables! | ||
Naming rules for each dot separated part are the same as for keys (see above). | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
```toml | ||
[dog.tater] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if
Newline
was defined as either\r\n
or\n
. This leaves out a lone\r
as qualifying as a new line, but I think this OK, unless it's still commonly used somewhere?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
\r
was once used on Mac, but Mac OS X changed that AFAIK.