From f04fa248b28f14ff1c8be871af7d4ce50794c1ca Mon Sep 17 00:00:00 2001 From: Andrew Gallant Date: Wed, 25 Jun 2014 19:48:15 -0400 Subject: [PATCH 1/5] Add multiline and raw strings to TOML. --- README.md | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 54 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 58fa9673..ec67a767 100644 --- a/README.md +++ b/README.md @@ -91,7 +91,37 @@ characters (U+0000 to U+001F). "I'm a string. \"You can quote me\". Name\tJos\u00E9\nLocation\tSF." ``` -For convenience, some popular characters have a compact escape sequence. +Multi-line strings are also supported with enclosing triple quotes. If the +first character is a new line (`0xA`), then it is trimmed. All other whitespace +remains intact. + +```toml +# The following strings are byte-for-byte equivalent: +key1 = "One\nTwo" +key2 = """One\nTwo""" +key3 = """ +One +Two""" +``` + +For writing long strings without introducing extraneous whitespace, use `\\n`. +The `\\n` will be trimmed along with all whitespace (including new lines) up +to the next non-whitespace character: + +```toml +# The following strings are byte-for-byte equivalent: +key1 = "The quick brown fox jumps over the lazy dog." +key2 = """ +The quick brown \ + + + fox jumps over \ + the lazy dog.""" +``` + +For convenience, some popular characters have a compact escape sequence. +Escape sequences can be used in `"single line strings"` and in `"""multi line +strings"""`. ``` \b - backspace (U+0008) @@ -110,11 +140,31 @@ Any Unicode character may be escaped with the `\uXXXX` or `\UXXXXXXXX` forms. Note that the escape codes must be valid Unicode code points. Other special characters are reserved and, if used, TOML should produce an -error. This means paths on Windows will always have to use double backslashes. +error. + +TOML also supports raw strings where there is no escaping allowed at all. Raw +strings are enclosed with single quotes. Like regular strings, they must appear +on a single line: + +```toml +# What you see is what you get. +winpath = 'C:\Users\nodejs\templates' +winpath2 = '\\ServerX\qux\' +quoted = 'Tom "Dubs" Preston-Werner' +regex = '\d+' +``` + +Since there is no escaping, there is no way to write a single quote inside a +raw string enclosed by single quotes. So use a multi-line raw string instead: ```toml -wrong = "C:\Users\nodejs\templates" # note: doesn't produce a valid path -right = "C:\\Users\\nodejs\\templates" +regex2 = '''I [dw]on't need \d{2} apples''' +lines = ''' +The first new line is +trimmed in raw strings. + All other whitespace + is preserved. +''' ``` For binary data it is recommended that you use Base64 or another suitable From f17c54c3006644601bb3640b897308468ef9a556 Mon Sep 17 00:00:00 2001 From: Tom Preston-Werner Date: Mon, 30 Jun 2014 12:16:37 -0500 Subject: [PATCH 2/5] Refine spec for the four string variants. --- README.md | 116 ++++++++++++++++++++++++++++++++---------------------- 1 file changed, 69 insertions(+), 47 deletions(-) diff --git a/README.md b/README.md index ec67a767..26abf313 100644 --- a/README.md +++ b/README.md @@ -79,21 +79,47 @@ key = "value" # Yeah, you can do this. String ------ -ProTip™: You may notice that this specification is the same as JSON's string -definition, except that TOML requires UTF-8 encoding. This is on purpose. +There are four ways to express strings: basic, multi-line basic, literal, and +multi-line literal. All strings must contain only valid UTF-8 characters. -Strings are single-line values surrounded by quotation marks. Strings must -contain only valid UTF-8 characters. Any Unicode character may be used except -those that must be escaped: quotation mark, backslash, and the control -characters (U+0000 to U+001F). +**Basic strings** are surrounded by quotation marks. Any Unicode character may +**be used except those that must be escaped: quotation mark, backslash, and the +**control characters (U+0000 to U+001F). ```toml "I'm a string. \"You can quote me\". Name\tJos\u00E9\nLocation\tSF." ``` -Multi-line strings are also supported with enclosing triple quotes. If the -first character is a new line (`0xA`), then it is trimmed. All other whitespace -remains intact. +For convenience, some popular characters have a compact escape sequence. + +``` +\b - backspace (U+0008) +\t - tab (U+0009) +\n - linefeed (U+000A) +\f - form feed (U+000C) +\r - carriage return (U+000D) +\" - quote (U+0022) +\/ - slash (U+002F) +\\ - backslash (U+005C) +\uXXXX - unicode (U+XXXX) +\UXXXXXXXX - unicode (U+XXXXXXXX) +``` + +Any Unicode character may be escaped with the `\uXXXX` or `\UXXXXXXXX` forms. +Note that the escape codes must be valid Unicode code points. + +Other special characters are reserved and, if used, TOML should produce an +error. + +ProTip™: You may notice that the above string specification is the same as +JSON's string definition, except that TOML requires UTF-8 encoding. This is on +purpose. + +Sometimes you need to express passages of text (e.g. translation files) or would +like to break up a very long string into multiple lines. TOML makes this easy. +**Multi-line basic strings** are surrounded by three quotation marks on each +side and allow newlines. If the first character after the opening delimiter is a +newline (`0x0A`), then it is trimmed. All other whitespace remains intact. ```toml # The following strings are byte-for-byte equivalent: @@ -104,71 +130,67 @@ One Two""" ``` -For writing long strings without introducing extraneous whitespace, use `\\n`. -The `\\n` will be trimmed along with all whitespace (including new lines) up -to the next non-whitespace character: +For writing long strings without introducing extraneous whitespace, end a line +with a `\`. The `\` will be trimmed along with all whitespace (including +newlines) up to the next non-whitespace character or closing delimiter. If the +first two characters after the opening delimiter are a backslash and a newline +(`0x5C0A`), then they will both be trimmed along with all whitespace (including +newlines) up to the next non-whitespace character or closing delimiter. All of +the escape sequences that are valid for basic strings are also valid for +multi-line basic strings. ```toml # The following strings are byte-for-byte equivalent: key1 = "The quick brown fox jumps over the lazy dog." + key2 = """ The quick brown \ fox jumps over \ the lazy dog.""" -``` - -For convenience, some popular characters have a compact escape sequence. -Escape sequences can be used in `"single line strings"` and in `"""multi line -strings"""`. +key3 = """\ + The quick brown \ + fox jumps over \ + the lazy dog.\ + """ ``` -\b - backspace (U+0008) -\t - tab (U+0009) -\n - linefeed (U+000A) -\f - form feed (U+000C) -\r - carriage return (U+000D) -\" - quote (U+0022) -\/ - slash (U+002F) -\\ - backslash (U+005C) -\uXXXX - unicode (U+XXXX) -\UXXXXXXXX - unicode (U+XXXXXXXX) -``` - -Any Unicode character may be escaped with the `\uXXXX` or `\UXXXXXXXX` forms. -Note that the escape codes must be valid Unicode code points. - -Other special characters are reserved and, if used, TOML should produce an -error. -TOML also supports raw strings where there is no escaping allowed at all. Raw -strings are enclosed with single quotes. Like regular strings, they must appear -on a single line: +If you're a frequent specifier of Windows paths or regular expressions, then +having to escape backslashes quickly becomes tedious and error prone. To help, +TOML supports literal strings where there is no escaping allowed at all. +**Literal strings** are surrounded by single quotes. Like basic strings, they +must appear on a single line: ```toml # What you see is what you get. -winpath = 'C:\Users\nodejs\templates' -winpath2 = '\\ServerX\qux\' -quoted = 'Tom "Dubs" Preston-Werner' -regex = '\d+' +winpath = 'C:\Users\nodejs\templates' +winpath2 = '\\ServerX\admin$\system32\' +quoted = 'Tom "Dubs" Preston-Werner' +regex = '<\i\c*\s*>' ``` -Since there is no escaping, there is no way to write a single quote inside a -raw string enclosed by single quotes. So use a multi-line raw string instead: +Since there is no escaping, there is no way to write a single quote inside a +literal string enclosed by single quotes. Luckily, TOML supports a mult-line +version of literal strings that solves this problem. **Multi-line literal +strings** are surrounded by three single quotes on each side and allow newlines. +Like literal strings, there is no escaping whatsoever. If the first character +after the opening delimiter is a newline (`0x0A`), then it is trimmed. All other +content between the delimiters is interpreted as-is without modification. ```toml regex2 = '''I [dw]on't need \d{2} apples''' -lines = ''' -The first new line is +lines = ''' +The first newline is trimmed in raw strings. All other whitespace is preserved. ''' ``` -For binary data it is recommended that you use Base64 or another suitable -encoding. The handling of that encoding will be application specific. +For binary data it is recommended that you use Base64 or another suitable ASCII +or UTF-8 encoding. The handling of that encoding will be application specific. Integer ------- From 2cd7cd2f389aa7e978240143799f04d8ca1c5934 Mon Sep 17 00:00:00 2001 From: Tom Preston-Werner Date: Thu, 10 Jul 2014 23:03:49 -0700 Subject: [PATCH 3/5] Fix typos. --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 26abf313..7876cf6e 100644 --- a/README.md +++ b/README.md @@ -83,8 +83,8 @@ There are four ways to express strings: basic, multi-line basic, literal, and multi-line literal. All strings must contain only valid UTF-8 characters. **Basic strings** are surrounded by quotation marks. Any Unicode character may -**be used except those that must be escaped: quotation mark, backslash, and the -**control characters (U+0000 to U+001F). +be used except those that must be escaped: quotation mark, backslash, and the +control characters (U+0000 to U+001F). ```toml "I'm a string. \"You can quote me\". Name\tJos\u00E9\nLocation\tSF." @@ -172,7 +172,7 @@ regex = '<\i\c*\s*>' ``` Since there is no escaping, there is no way to write a single quote inside a -literal string enclosed by single quotes. Luckily, TOML supports a mult-line +literal string enclosed by single quotes. Luckily, TOML supports a multi-line version of literal strings that solves this problem. **Multi-line literal strings** are surrounded by three single quotes on each side and allow newlines. Like literal strings, there is no escaping whatsoever. If the first character From 7f33170a21aca027dfbf6b6ae8e06a5d939625a0 Mon Sep 17 00:00:00 2001 From: Tom Preston-Werner Date: Thu, 10 Jul 2014 23:20:43 -0700 Subject: [PATCH 4/5] Clarify escaping of quotation marks in multi-line strings. --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 7876cf6e..27b55702 100644 --- a/README.md +++ b/README.md @@ -157,6 +157,10 @@ key3 = """\ """ ``` +Any Unicode character may be used except those that must be escaped: backslash, +and the control characters (U+0000 to U+001F). Quotation marks need not be +escaped unless their presence would create a premature closing delimiter. + If you're a frequent specifier of Windows paths or regular expressions, then having to escape backslashes quickly becomes tedious and error prone. To help, TOML supports literal strings where there is no escaping allowed at all. From a7b35c346935995ffae501f22f5ae65f3b069e88 Mon Sep 17 00:00:00 2001 From: Tom Preston-Werner Date: Fri, 11 Jul 2014 10:41:21 -0700 Subject: [PATCH 5/5] Remove extraneous comma. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 27b55702..0a20c5d6 100644 --- a/README.md +++ b/README.md @@ -157,7 +157,7 @@ key3 = """\ """ ``` -Any Unicode character may be used except those that must be escaped: backslash, +Any Unicode character may be used except those that must be escaped: backslash and the control characters (U+0000 to U+001F). Quotation marks need not be escaped unless their presence would create a premature closing delimiter.