From c8ffd61c5d9cc018945e4574612fe870c328860b Mon Sep 17 00:00:00 2001 From: Tobias Sorn Date: Mon, 15 Jul 2019 17:07:38 +0200 Subject: [PATCH 01/15] [RFC] 0006 Properties File Encoding --- rfcs/0006-properties-file-encoding.md | 81 +++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 rfcs/0006-properties-file-encoding.md diff --git a/rfcs/0006-properties-file-encoding.md b/rfcs/0006-properties-file-encoding.md new file mode 100644 index 0000000000..02b8a12e3d --- /dev/null +++ b/rfcs/0006-properties-file-encoding.md @@ -0,0 +1,81 @@ +- Start Date: 2019-07-15 +- RFC PR: - +- Issue: [#161](https://github.com/SAP/ui5-tooling/issues/161) +- Affected components + + [x] [ui5-builder](https://github.com/SAP/ui5-builder) + + [x] [ui5-server](https://github.com/SAP/ui5-server) + + [ ] [ui5-cli](https://github.com/SAP/ui5-cli) + + [ ] [ui5-fs](https://github.com/SAP/ui5-fs) + + [ ] [ui5-project](https://github.com/SAP/ui5-project) + + [ ] [ui5-logger](https://github.com/SAP/ui5-logger) + + +# RFC 0006 Properties File Encoding +## Summary +Properties files (`*.properties`) should be encoded in UTF-8 such that the client properly can consume it. + +## Motivation +Currently the properties files are encoded in ISO-8859-1 but served as UTF-8 which leads to the problem that special characters are not displayed correctly. +The user wants to be able to use properties files with ISO-8859-1 encoding and also have the option to specify the encoding. + +## Detailed design +### Configuration +A configuration for the task/server should be provided such that the source encoding can be specified. +There should be 2 options: +* `ISO-8859-1` (non ASCII characters will be escaped with the unicode sequence `\uXXXX`) +* `UTF-8` (no character escaping will take place) + +The default is `ISO-8859-1` for compatibility reasons. + +Umlaut Example: + +| Umlaut | Unicode | +|----------|--------------------| +| `Ä`, `ä` | `\u00c4`, `\u00e4` | +| `Ö`, `ö` | `\u00d6`, `\u00f6` | +| `Ü`, `ü` | `\u00dc`, `\u00fc` | +| `ß` | `\u00df ` | + +This should be configured within the .ui5.yml file. + +Example: +```yaml +specVersion: "1.0" +type: application +metadata: + name: my.application +resources: + propertiesFiles: + - encoding: "UTF-8" +``` + +The resources section within the configuration can be consumed by `ui5-builder` and `ui5-server`. + +### Build Task +The `ui5-builder` should offer a task called `encodePropertiesFiles` which escapes all special characters in unicode using the unicode escape sequence `\uXXXX`. +It should use a processor called `stringEscaper` which escapes special characters in files and is used within the task to operate only on `*.properties` files. +The processor offers a function called `escapeNonAsciiAsUnicode` which performs the unicode escaping. +The task should be run first (before `replaceCopyright`) for all types. +This ensures that the properties files can always be consumed. +Each build output contains then the escaped properties files. +This task should automatically be integrated into the build process and can be reused by the server. + +### Server part +The `ui5-server` offers the capability to re-use the ui5-builder processor `stringEscaper` used by the build task to escape these characters. +The served content can always be interpreted as UTF-8 and therefore be properly consumed. +There should be a special treatment in [serveResources](https://github.com/SAP/ui5-server/blob/master/lib/middleware/serveResources.js#L42) middleware which uses the configuration and the processor to serve properties files such that they can be interpreted as UTF-8 encoded files. + + +## How we teach this +- Documentation about how to specify the properties files encoding option +- Explanation of the Escaping with examples + +## Drawbacks + + +## Alternatives + - Check the file contents and guess its encoding + - Use response content-type header attribute + - Only escape if required + +## Unresolved Questions and Bikeshedding From 1a1f1eb6f66bf3c619d14c94cd3d2f2aed0c21a2 Mon Sep 17 00:00:00 2001 From: Tobias Sorn Date: Mon, 15 Jul 2019 17:13:56 +0200 Subject: [PATCH 02/15] [RFC] 0007 Properties File Encoding change to 0007 because 0006 is already present --- ...erties-file-encoding.md => 0007-properties-file-encoding.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename rfcs/{0006-properties-file-encoding.md => 0007-properties-file-encoding.md} (98%) diff --git a/rfcs/0006-properties-file-encoding.md b/rfcs/0007-properties-file-encoding.md similarity index 98% rename from rfcs/0006-properties-file-encoding.md rename to rfcs/0007-properties-file-encoding.md index 02b8a12e3d..17c8a8f4d4 100644 --- a/rfcs/0006-properties-file-encoding.md +++ b/rfcs/0007-properties-file-encoding.md @@ -10,7 +10,7 @@ + [ ] [ui5-logger](https://github.com/SAP/ui5-logger) -# RFC 0006 Properties File Encoding +# RFC 0007 Properties File Encoding ## Summary Properties files (`*.properties`) should be encoded in UTF-8 such that the client properly can consume it. From f06db59af1b8992c6cf552ca3ef470cafbf551dd Mon Sep 17 00:00:00 2001 From: Tobias Sorn Date: Tue, 16 Jul 2019 08:08:06 +0200 Subject: [PATCH 03/15] Update 0007-properties-file-encoding.md --- rfcs/0007-properties-file-encoding.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/0007-properties-file-encoding.md b/rfcs/0007-properties-file-encoding.md index 17c8a8f4d4..96c66bfd08 100644 --- a/rfcs/0007-properties-file-encoding.md +++ b/rfcs/0007-properties-file-encoding.md @@ -1,5 +1,5 @@ - Start Date: 2019-07-15 -- RFC PR: - +- RFC PR: [#168](https://github.com/SAP/ui5-tooling/pull/168) - Issue: [#161](https://github.com/SAP/ui5-tooling/issues/161) - Affected components + [x] [ui5-builder](https://github.com/SAP/ui5-builder) From cb3e94490fe38a2287a72c887690c091b5b993f0 Mon Sep 17 00:00:00 2001 From: Thorsten Hochreuter Date: Tue, 16 Jul 2019 13:05:37 +0200 Subject: [PATCH 04/15] Rework RFC content - Include reviews - Fix MD warnings --- rfcs/0007-properties-file-encoding.md | 70 ++++++++++++++++----------- 1 file changed, 42 insertions(+), 28 deletions(-) diff --git a/rfcs/0007-properties-file-encoding.md b/rfcs/0007-properties-file-encoding.md index 96c66bfd08..35561df016 100644 --- a/rfcs/0007-properties-file-encoding.md +++ b/rfcs/0007-properties-file-encoding.md @@ -1,44 +1,52 @@ +# RFC 0007 Properties File Encoding + - Start Date: 2019-07-15 - RFC PR: [#168](https://github.com/SAP/ui5-tooling/pull/168) - Issue: [#161](https://github.com/SAP/ui5-tooling/issues/161) - Affected components - + [x] [ui5-builder](https://github.com/SAP/ui5-builder) - + [x] [ui5-server](https://github.com/SAP/ui5-server) - + [ ] [ui5-cli](https://github.com/SAP/ui5-cli) - + [ ] [ui5-fs](https://github.com/SAP/ui5-fs) - + [ ] [ui5-project](https://github.com/SAP/ui5-project) - + [ ] [ui5-logger](https://github.com/SAP/ui5-logger) - + - [x] [ui5-builder](https://github.com/SAP/ui5-builder) + - [x] [ui5-server](https://github.com/SAP/ui5-server) + - [ ] [ui5-cli](https://github.com/SAP/ui5-cli) + - [ ] [ui5-fs](https://github.com/SAP/ui5-fs) + - [ ] [ui5-project](https://github.com/SAP/ui5-project) + - [ ] [ui5-logger](https://github.com/SAP/ui5-logger) -# RFC 0007 Properties File Encoding ## Summary -Properties files (`*.properties`) should be encoded in UTF-8 such that the client properly can consume it. + +Properties files (`*.properties`) should be encoded in pure ASCII when serving them via the `ui5-server` or building an application/library with the `ui5-builder`. ## Motivation -Currently the properties files are encoded in ISO-8859-1 but served as UTF-8 which leads to the problem that special characters are not displayed correctly. -The user wants to be able to use properties files with ISO-8859-1 encoding and also have the option to specify the encoding. + +Currently the properties files are mostly encoded in ISO-8859-1 (which is used by most existing SAP server platforms). +By default the files are served as UTF-8 by the `ui5-server`. This will lead to the problem that special characters are not displayed correctly. +The user wants to be able to use properties files with ISO-8859-1 encoding. Additionally the user wants have the option to specify the encoding. ## Detailed design + ### Configuration -A configuration for the task/server should be provided such that the source encoding can be specified. + +A configuration for the `ui5-builder` tasks and the `ui5-server` should be provided such that the source encoding can be specified. + There should be 2 options: -* `ISO-8859-1` (non ASCII characters will be escaped with the unicode sequence `\uXXXX`) -* `UTF-8` (no character escaping will take place) + +- `ISO-8859-1` (non ASCII characters will be escaped with the unicode sequence `\uXXXX`) +- `UTF-8` (no character escaping will take place) The default is `ISO-8859-1` for compatibility reasons. Umlaut Example: -| Umlaut | Unicode | +| Umlaut | UTF-8 | |----------|--------------------| | `Ä`, `ä` | `\u00c4`, `\u00e4` | | `Ö`, `ö` | `\u00d6`, `\u00f6` | | `Ü`, `ü` | `\u00dc`, `\u00fc` | -| `ß` | `\u00df ` | +| `ß` | `\u00df` | -This should be configured within the .ui5.yml file. +This should be configured within the `ui5.yaml` file. Example: + ```yaml specVersion: "1.0" type: application @@ -52,30 +60,36 @@ resources: The resources section within the configuration can be consumed by `ui5-builder` and `ui5-server`. ### Build Task -The `ui5-builder` should offer a task called `encodePropertiesFiles` which escapes all special characters in unicode using the unicode escape sequence `\uXXXX`. + +The `ui5-builder` should offer a new standard task called `escapePropertiesFiles` which escapes all special characters in unicode using the unicode escape sequence `\uXXXX`. It should use a processor called `stringEscaper` which escapes special characters in files and is used within the task to operate only on `*.properties` files. -The processor offers a function called `escapeNonAsciiAsUnicode` which performs the unicode escaping. The task should be run first (before `replaceCopyright`) for all types. This ensures that the properties files can always be consumed. -Each build output contains then the escaped properties files. -This task should automatically be integrated into the build process and can be reused by the server. +Each build output should contain then the escaped properties files. +This ensures taht the now purely ASCII based `*.properties` files can be used on other platforms too. + +The new standard task should automatically be integrated into the build process and can be reused by the server. ### Server part -The `ui5-server` offers the capability to re-use the ui5-builder processor `stringEscaper` used by the build task to escape these characters. -The served content can always be interpreted as UTF-8 and therefore be properly consumed. -There should be a special treatment in [serveResources](https://github.com/SAP/ui5-server/blob/master/lib/middleware/serveResources.js#L42) middleware which uses the configuration and the processor to serve properties files such that they can be interpreted as UTF-8 encoded files. +Currently the `ui5-server` serves all resources as UTF-8. + +The `ui5-server` should offer the capability to re-use the `ui5-builder` processor `stringEscaper` to also escape characters while serving `*.properties`. +Due to the pure ASCII representation now, the served content can always be interpreted and properly consumed as UTF-8. + +Technically, there should be a special treatment in [serveResources](https://github.com/SAP/ui5-server/blob/master/lib/middleware/serveResources.js#L42) middleware which uses the configuration and the processor to serve properties files such that they can be interpreted as UTF-8 encoded files. ## How we teach this + - Documentation about how to specify the properties files encoding option - Explanation of the Escaping with examples ## Drawbacks - ## Alternatives - - Check the file contents and guess its encoding - - Use response content-type header attribute - - Only escape if required + +- Check the file contents and guess its encoding + - Use response content-type header attribute + - Only escape if the file encoding and the transport encoding do not match, e.g. file is encoded as `ISO-8859-1` but will be served as `UTF-8` ## Unresolved Questions and Bikeshedding From 74fc4b07b8cc964c5dfbc62ca93c13509715ca6f Mon Sep 17 00:00:00 2001 From: Thorsten Hochreuter Date: Tue, 16 Jul 2019 14:25:15 +0200 Subject: [PATCH 05/15] Change task-name and naming of config parameter --- rfcs/0007-properties-file-encoding.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/rfcs/0007-properties-file-encoding.md b/rfcs/0007-properties-file-encoding.md index 35561df016..92cd468f06 100644 --- a/rfcs/0007-properties-file-encoding.md +++ b/rfcs/0007-properties-file-encoding.md @@ -53,15 +53,14 @@ type: application metadata: name: my.application resources: - propertiesFiles: - - encoding: "UTF-8" + propertiesFileEncoding: "UTF-8" ``` The resources section within the configuration can be consumed by `ui5-builder` and `ui5-server`. ### Build Task -The `ui5-builder` should offer a new standard task called `escapePropertiesFiles` which escapes all special characters in unicode using the unicode escape sequence `\uXXXX`. +The `ui5-builder` should offer a new standard task called `escapeNonAsciiCharacters` which escapes all special characters in unicode using the unicode escape sequence `\uXXXX`. It should use a processor called `stringEscaper` which escapes special characters in files and is used within the task to operate only on `*.properties` files. The task should be run first (before `replaceCopyright`) for all types. This ensures that the properties files can always be consumed. From 0fa902fcae38edffd6299ecdaaf143191690f769 Mon Sep 17 00:00:00 2001 From: Tobias Sorn Date: Mon, 22 Jul 2019 12:24:17 +0200 Subject: [PATCH 06/15] [RFC] 0007 Properties File Encoding Adjust content because files' content should always be escaped. --- rfcs/0007-properties-file-encoding.md | 47 ++++++++++++++------------- 1 file changed, 25 insertions(+), 22 deletions(-) diff --git a/rfcs/0007-properties-file-encoding.md b/rfcs/0007-properties-file-encoding.md index 92cd468f06..db0ff9b79d 100644 --- a/rfcs/0007-properties-file-encoding.md +++ b/rfcs/0007-properties-file-encoding.md @@ -25,23 +25,14 @@ The user wants to be able to use properties files with ISO-8859-1 encoding. Addi ### Configuration -A configuration for the `ui5-builder` tasks and the `ui5-server` should be provided such that the source encoding can be specified. +A configuration for the `ui5-builder` tasks and the `ui5-server` should be provided such that the source encoding of `*.properties` files can be specified. -There should be 2 options: +Encodings should be supported according to the [Encoding spec](https://encoding.spec.whatwg.org/). +For more information check out: [Buffers and Character Encodings](https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings) -- `ISO-8859-1` (non ASCII characters will be escaped with the unicode sequence `\uXXXX`) -- `UTF-8` (no character escaping will take place) +Sample values are: 'ascii', 'utf8', 'utf16le', 'latin1' ('ISO-8859-1'), ... -The default is `ISO-8859-1` for compatibility reasons. - -Umlaut Example: - -| Umlaut | UTF-8 | -|----------|--------------------| -| `Ä`, `ä` | `\u00c4`, `\u00e4` | -| `Ö`, `ö` | `\u00d6`, `\u00f6` | -| `Ü`, `ü` | `\u00dc`, `\u00fc` | -| `ß` | `\u00df` | +The default is `ISO-8859-1` for compatibility reasons (set by the formatters). This should be configured within the `ui5.yaml` file. @@ -61,22 +52,34 @@ The resources section within the configuration can be consumed by `ui5-builder` ### Build Task The `ui5-builder` should offer a new standard task called `escapeNonAsciiCharacters` which escapes all special characters in unicode using the unicode escape sequence `\uXXXX`. -It should use a processor called `stringEscaper` which escapes special characters in files and is used within the task to operate only on `*.properties` files. -The task should be run first (before `replaceCopyright`) for all types. +It should use a processor called `stringEscaper` which escapes non ascii characters (characters which are not within the 128 character ASCII range) within a given string. + + +Umlaut Example: + +| Umlaut | UTF-8 | +|----------|--------------------| +| `Ä`, `ä` | `\u00c4`, `\u00e4` | +| `Ö`, `ö` | `\u00d6`, `\u00f6` | +| `Ü`, `ü` | `\u00dc`, `\u00fc` | +| `ß` | `\u00df` | + +The task operates on `*.properties` files using the processor. +The task should run first (before `replaceCopyright`) for all types. This ensures that the properties files can always be consumed. -Each build output should contain then the escaped properties files. -This ensures taht the now purely ASCII based `*.properties` files can be used on other platforms too. +Each build output should contain the escaped properties files. +This ensures that all `properties` contain only ASCII characters such that they can be consumed by other platforms. -The new standard task should automatically be integrated into the build process and can be reused by the server. +The new standard task should automatically be integrated into the build process and can be reused by the `ui5-server`. ### Server part -Currently the `ui5-server` serves all resources as UTF-8. +The `ui5-server` should serve all resources using `UTF-8` [charset header](https://www.w3.org/International/articles/http-charset/index.en). -The `ui5-server` should offer the capability to re-use the `ui5-builder` processor `stringEscaper` to also escape characters while serving `*.properties`. +The `ui5-server` should offer the capability to re-use the `ui5-builder` processor `stringEscaper` to escape characters in `*.properties` files. Due to the pure ASCII representation now, the served content can always be interpreted and properly consumed as UTF-8. -Technically, there should be a special treatment in [serveResources](https://github.com/SAP/ui5-server/blob/master/lib/middleware/serveResources.js#L42) middleware which uses the configuration and the processor to serve properties files such that they can be interpreted as UTF-8 encoded files. +Technically, there should be a special treatment in [serveResources](https://github.com/SAP/ui5-server/blob/master/lib/middleware/serveResources.js#L42) middleware which uses the configuration and the processor to serve properties files such that they can be interpreted as `UTF-8` encoded content. ## How we teach this From 9d08e04f77b6a6a581d9563e44ac9479d7d13835 Mon Sep 17 00:00:00 2001 From: Tobias Sorn Date: Mon, 22 Jul 2019 15:59:25 +0200 Subject: [PATCH 07/15] [RFC] 0007 Properties File Encoding Adjust to latest discussion regarding encoding parameter --- rfcs/0007-properties-file-encoding.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/rfcs/0007-properties-file-encoding.md b/rfcs/0007-properties-file-encoding.md index db0ff9b79d..071d795e0f 100644 --- a/rfcs/0007-properties-file-encoding.md +++ b/rfcs/0007-properties-file-encoding.md @@ -30,7 +30,7 @@ A configuration for the `ui5-builder` tasks and the `ui5-server` should be provi Encodings should be supported according to the [Encoding spec](https://encoding.spec.whatwg.org/). For more information check out: [Buffers and Character Encodings](https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings) -Sample values are: 'ascii', 'utf8', 'utf16le', 'latin1' ('ISO-8859-1'), ... +Supported values are: `UTF-8` and `ISO-8859-1` The default is `ISO-8859-1` for compatibility reasons (set by the formatters). @@ -52,7 +52,8 @@ The resources section within the configuration can be consumed by `ui5-builder` ### Build Task The `ui5-builder` should offer a new standard task called `escapeNonAsciiCharacters` which escapes all special characters in unicode using the unicode escape sequence `\uXXXX`. -It should use a processor called `stringEscaper` which escapes non ascii characters (characters which are not within the 128 character ASCII range) within a given string. +It should use a processor called `nonAsciiEscaper` which escapes non ascii characters (characters which are not within the 128 character ASCII range) within a given string. +The processor `nonAsciiEscaper` should offer an encoding parameter and a method which provides valid values for this option (`nonAsciiEscaper#getEncodingFromNiceName`). Umlaut Example: @@ -76,7 +77,7 @@ The new standard task should automatically be integrated into the build process The `ui5-server` should serve all resources using `UTF-8` [charset header](https://www.w3.org/International/articles/http-charset/index.en). -The `ui5-server` should offer the capability to re-use the `ui5-builder` processor `stringEscaper` to escape characters in `*.properties` files. +The `ui5-server` should offer the capability to re-use the `ui5-builder` processor `nonAsciiEscaper` to escape characters in `*.properties` files. Due to the pure ASCII representation now, the served content can always be interpreted and properly consumed as UTF-8. Technically, there should be a special treatment in [serveResources](https://github.com/SAP/ui5-server/blob/master/lib/middleware/serveResources.js#L42) middleware which uses the configuration and the processor to serve properties files such that they can be interpreted as `UTF-8` encoded content. From 8b31e8c2ca56a247463bb5c1ad967eb1f474bfbd Mon Sep 17 00:00:00 2001 From: Tobias Sorn Date: Tue, 23 Jul 2019 13:49:05 +0200 Subject: [PATCH 08/15] [RFC] 0007 Properties File Encoding Adjust to latest changes in ui5-builder --- rfcs/0007-properties-file-encoding.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/0007-properties-file-encoding.md b/rfcs/0007-properties-file-encoding.md index 071d795e0f..4a131facd3 100644 --- a/rfcs/0007-properties-file-encoding.md +++ b/rfcs/0007-properties-file-encoding.md @@ -53,7 +53,7 @@ The resources section within the configuration can be consumed by `ui5-builder` The `ui5-builder` should offer a new standard task called `escapeNonAsciiCharacters` which escapes all special characters in unicode using the unicode escape sequence `\uXXXX`. It should use a processor called `nonAsciiEscaper` which escapes non ascii characters (characters which are not within the 128 character ASCII range) within a given string. -The processor `nonAsciiEscaper` should offer an encoding parameter and a method which provides valid values for this option (`nonAsciiEscaper#getEncodingFromNiceName`). +The processor `nonAsciiEscaper` should offer an encoding parameter and a method which provides valid values for this option (`nonAsciiEscaper#getEncodingFromAlias`). Umlaut Example: From 0c0f21c98c331a8a3df9ac1a3d9f1e0289fb69e2 Mon Sep 17 00:00:00 2001 From: Tobias Sorn Date: Wed, 24 Jul 2019 11:13:57 +0200 Subject: [PATCH 09/15] put propertiesFileEncoding underneath configuration in ui5.yaml --- rfcs/0007-properties-file-encoding.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rfcs/0007-properties-file-encoding.md b/rfcs/0007-properties-file-encoding.md index 4a131facd3..ce79ba66f3 100644 --- a/rfcs/0007-properties-file-encoding.md +++ b/rfcs/0007-properties-file-encoding.md @@ -44,7 +44,8 @@ type: application metadata: name: my.application resources: - propertiesFileEncoding: "UTF-8" + configuration: + propertiesFileEncoding: "UTF-8" ``` The resources section within the configuration can be consumed by `ui5-builder` and `ui5-server`. From 823961126d892c7b2047f7e3ac55f6f3d1e1a3d3 Mon Sep 17 00:00:00 2001 From: Tobias Sorn Date: Fri, 26 Jul 2019 16:02:35 +0200 Subject: [PATCH 10/15] rename config variable to propertiesFileEncoding Improve motivation section by being more specific about the encoding options --- rfcs/0007-properties-file-encoding.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/rfcs/0007-properties-file-encoding.md b/rfcs/0007-properties-file-encoding.md index ce79ba66f3..0a56748198 100644 --- a/rfcs/0007-properties-file-encoding.md +++ b/rfcs/0007-properties-file-encoding.md @@ -18,8 +18,8 @@ Properties files (`*.properties`) should be encoded in pure ASCII when serving t ## Motivation Currently the properties files are mostly encoded in ISO-8859-1 (which is used by most existing SAP server platforms). -By default the files are served as UTF-8 by the `ui5-server`. This will lead to the problem that special characters are not displayed correctly. -The user wants to be able to use properties files with ISO-8859-1 encoding. Additionally the user wants have the option to specify the encoding. +By default the files are served as UTF-8 by the `ui5-server`. This will lead to the problem that special characters are not displayed correctly because they are read using UTF-8 encoding. +The user wants to be able to use properties files with ISO-8859-1 encoding. Additionally the user wants have the option to specify UTF-8 encoding if the properties file contain special characters. ## Detailed design @@ -45,7 +45,7 @@ metadata: name: my.application resources: configuration: - propertiesFileEncoding: "UTF-8" + propertiesFileSourceEncoding: "UTF-8" ``` The resources section within the configuration can be consumed by `ui5-builder` and `ui5-server`. From 99ea2cc7ff8e7691e58beed363cdde4f909659b6 Mon Sep 17 00:00:00 2001 From: Tobias Sorn Date: Fri, 26 Jul 2019 16:03:48 +0200 Subject: [PATCH 11/15] Improve motivation section by being more specific about the encoding options --- rfcs/0007-properties-file-encoding.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rfcs/0007-properties-file-encoding.md b/rfcs/0007-properties-file-encoding.md index 0a56748198..dddddf3233 100644 --- a/rfcs/0007-properties-file-encoding.md +++ b/rfcs/0007-properties-file-encoding.md @@ -19,7 +19,8 @@ Properties files (`*.properties`) should be encoded in pure ASCII when serving t Currently the properties files are mostly encoded in ISO-8859-1 (which is used by most existing SAP server platforms). By default the files are served as UTF-8 by the `ui5-server`. This will lead to the problem that special characters are not displayed correctly because they are read using UTF-8 encoding. -The user wants to be able to use properties files with ISO-8859-1 encoding. Additionally the user wants have the option to specify UTF-8 encoding if the properties file contain special characters. +The user wants to be able to use properties files with ISO-8859-1 encoding. Additionally the user wants have the option to specify UTF-8 encoding +e.g. if a properties file contains special characters which are not present in ISO-8859-1. ## Detailed design From a3ef0bb0bf60f7d21f79e95e98fe5e07b1fb919c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matthias=20O=C3=9Fwald?= <1410947+matz3@users.noreply.github.com> Date: Mon, 29 Jul 2019 15:07:55 +0200 Subject: [PATCH 12/15] Update 0007-properties-file-encoding.md --- rfcs/0007-properties-file-encoding.md | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/rfcs/0007-properties-file-encoding.md b/rfcs/0007-properties-file-encoding.md index dddddf3233..713a492dbe 100644 --- a/rfcs/0007-properties-file-encoding.md +++ b/rfcs/0007-properties-file-encoding.md @@ -13,14 +13,24 @@ ## Summary -Properties files (`*.properties`) should be encoded in pure ASCII when serving them via the `ui5-server` or building an application/library with the `ui5-builder`. +Properties files (`*.properties`) should be allowed to use UTF-8 encoding, in addition to ISO-8859-1 (default). +When building projects or serving files, the output file content should be independent of encodings (plain ASCII) to prevent issues with different encoding expectations/limitations of other tools or servers. ## Motivation -Currently the properties files are mostly encoded in ISO-8859-1 (which is used by most existing SAP server platforms). -By default the files are served as UTF-8 by the `ui5-server`. This will lead to the problem that special characters are not displayed correctly because they are read using UTF-8 encoding. -The user wants to be able to use properties files with ISO-8859-1 encoding. Additionally the user wants have the option to specify UTF-8 encoding -e.g. if a properties file contains special characters which are not present in ISO-8859-1. +The i18n source files in UI5 are expected to be encoded in ISO-8859-1, based on the [Java 8 properties files](https://docs.oracle.com/javase/8/docs/api/java/util/Properties.html). +This is also part of the [UI5 Development Conventions and Guidelines](https://ui5.sap.com/#/topic/753b32617807462d9af483a437874b36). + +Properties files encoded with UTF-8 are causing issues with non-ASCII characters, as a different encoding is expected. + +But as UTF-8 is the default encoding for most of the programs and tools nowadays, this is quite cumbersome. Some editors don't even easily support reading from or writing to other encodings and expect UTF-8 by default. +Also with Java 9 the default encoding for properties files has been changed to UTF-8 ([JEP 226: UTF-8 Property Resource Bundles](http://openjdk.java.net/jeps/226)). + +A workaround to this problem is to use unicode escape sequences (`\uXXXX`) which makes the content independent from the encoding but very cumbersome to maintain without additional tools to convert the file. +This escaping solution is already used for most of the UI5 libraries, especially for locales which require unicode characters not supported in ISO-8859-1. + +To improve the overall developer experience and to prevent encoding issues, the UI5 Tooling should be enhanced to also support properties files encoded in UTF-8. +But as there are existing tools and server middleware which explicitly expect those files to be encoded in ISO-8859-1, the output of the UI5 Tooling needs to be independent of the encoding (plain ASCII). This should be achieved by converting non-ASCII characters to unicode escape sequences (`\uXXXX`), as already mentioned above. ## Detailed design @@ -54,7 +64,7 @@ The resources section within the configuration can be consumed by `ui5-builder` ### Build Task The `ui5-builder` should offer a new standard task called `escapeNonAsciiCharacters` which escapes all special characters in unicode using the unicode escape sequence `\uXXXX`. -It should use a processor called `nonAsciiEscaper` which escapes non ascii characters (characters which are not within the 128 character ASCII range) within a given string. +It should use a processor called `nonAsciiEscaper` which escapes non ASCII characters (characters which are not within the 128 character ASCII range) within a given string. The processor `nonAsciiEscaper` should offer an encoding parameter and a method which provides valid values for this option (`nonAsciiEscaper#getEncodingFromAlias`). @@ -69,11 +79,11 @@ Umlaut Example: The task operates on `*.properties` files using the processor. The task should run first (before `replaceCopyright`) for all types. -This ensures that the properties files can always be consumed. +This ensures that the properties files can always be consumed independent of the source encoding. Each build output should contain the escaped properties files. This ensures that all `properties` contain only ASCII characters such that they can be consumed by other platforms. -The new standard task should automatically be integrated into the build process and can be reused by the `ui5-server`. +The new standard task should automatically be integrated into the build process and the underlaying processor can be reused by the `ui5-server`. ### Server part From 5e029f968b27fe17b1eb7bd208b956a6f6d2cb5c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matthias=20O=C3=9Fwald?= <1410947+matz3@users.noreply.github.com> Date: Tue, 30 Jul 2019 09:42:20 +0200 Subject: [PATCH 13/15] Refine RFC --- rfcs/0007-properties-file-encoding.md | 43 +++++++++++++++++---------- 1 file changed, 27 insertions(+), 16 deletions(-) diff --git a/rfcs/0007-properties-file-encoding.md b/rfcs/0007-properties-file-encoding.md index 713a492dbe..653dfe6bec 100644 --- a/rfcs/0007-properties-file-encoding.md +++ b/rfcs/0007-properties-file-encoding.md @@ -13,7 +13,7 @@ ## Summary -Properties files (`*.properties`) should be allowed to use UTF-8 encoding, in addition to ISO-8859-1 (default). +Add possibility to use UTF-8 encoding for properties files (`*.properties`), in addition to the existing ISO-8859-1 encoding (default). When building projects or serving files, the output file content should be independent of encodings (plain ASCII) to prevent issues with different encoding expectations/limitations of other tools or servers. ## Motivation @@ -38,12 +38,14 @@ But as there are existing tools and server middleware which explicitly expect th A configuration for the `ui5-builder` tasks and the `ui5-server` should be provided such that the source encoding of `*.properties` files can be specified. -Encodings should be supported according to the [Encoding spec](https://encoding.spec.whatwg.org/). -For more information check out: [Buffers and Character Encodings](https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings) +This configuration is set on project level, so that multiple projects with different encodings function idependently. +Resources create as part of the project contain a reference to the project, which allows to read the expected source encoding for a single resource. + +Altought there are lots of different encodings, the configuration on projects only foresees the two relevant encodings. Supported values are: `UTF-8` and `ISO-8859-1` -The default is `ISO-8859-1` for compatibility reasons (set by the formatters). +The default is `ISO-8859-1` for compatibility reasons. This should be configured within the `ui5.yaml` file. @@ -61,12 +63,10 @@ resources: The resources section within the configuration can be consumed by `ui5-builder` and `ui5-server`. -### Build Task - -The `ui5-builder` should offer a new standard task called `escapeNonAsciiCharacters` which escapes all special characters in unicode using the unicode escape sequence `\uXXXX`. -It should use a processor called `nonAsciiEscaper` which escapes non ASCII characters (characters which are not within the 128 character ASCII range) within a given string. -The processor `nonAsciiEscaper` should offer an encoding parameter and a method which provides valid values for this option (`nonAsciiEscaper#getEncodingFromAlias`). +### Processor +A processor called `nonAsciiEscaper` should be added. +It escapes non ASCII characters (characters which are not within the 128 character ASCII range) within a given string with unicode escape sequence `\uXXXX`. Umlaut Example: @@ -77,22 +77,33 @@ Umlaut Example: | `Ü`, `ü` | `\u00dc`, `\u00fc` | | `ß` | `\u00df` | +The processor should offer an encoding parameter which defaults to "utf8" and passes it to the [Node.js Buffer#toString method](https://nodejs.org/api/buffer.html#buffer_buf_tostring_encoding_start_end). +Supported encodings are based on the Node.js implementation: https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings + +A static function (`nonAsciiEscaper.getEncodingFromAlias`) should be added to map valid encodings based on the supported configuration values (`UTF-8`, `ISO-8859-1`) to the Node.js encoding names (`utf8`, `latin1`). + +### Task + +A new task `escapeNonAsciiCharacters` should be added which receives the source encoding and a pattern to glob files within the workspace. +It uses the `nonAsciiEscaper.getEncodingFromAlias` function to map the source encoding (based on the project configuration values) to a valid encoding for the processor. + +### Types + The task operates on `*.properties` files using the processor. The task should run first (before `replaceCopyright`) for all types. + This ensures that the properties files can always be consumed independent of the source encoding. Each build output should contain the escaped properties files. -This ensures that all `properties` contain only ASCII characters such that they can be consumed by other platforms. The new standard task should automatically be integrated into the build process and the underlaying processor can be reused by the `ui5-server`. -### Server part - -The `ui5-server` should serve all resources using `UTF-8` [charset header](https://www.w3.org/International/articles/http-charset/index.en). +### Server -The `ui5-server` should offer the capability to re-use the `ui5-builder` processor `nonAsciiEscaper` to escape characters in `*.properties` files. -Due to the pure ASCII representation now, the served content can always be interpreted and properly consumed as UTF-8. +The `ui5-server` should not use special handling of response headers to ensure ISO-8859-1 encoding anymore. +Content-type / charset should be determined for all files by using the "mime-types" module. -Technically, there should be a special treatment in [serveResources](https://github.com/SAP/ui5-server/blob/master/lib/middleware/serveResources.js#L42) middleware which uses the configuration and the processor to serve properties files such that they can be interpreted as `UTF-8` encoded content. +The `serveResources` middleware should call the `nonAsciiEscaper` processor for all `*.properties` files and use the configured encoding of the project where the resource belongs to. This ensures that e.g. files from dependencies are handled properly, as it might have a different encoding than the root project running the server. +In case there is no project or no encoding configuration available, it should default to "ISO-8859-1". ## How we teach this From 2fb2e7b69c90cc1bd0f7b32d74d3b1d61f48aca5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matthias=20O=C3=9Fwald?= <1410947+matz3@users.noreply.github.com> Date: Tue, 30 Jul 2019 09:46:39 +0200 Subject: [PATCH 14/15] Update Types section --- rfcs/0007-properties-file-encoding.md | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/rfcs/0007-properties-file-encoding.md b/rfcs/0007-properties-file-encoding.md index 653dfe6bec..3fa9d49ef6 100644 --- a/rfcs/0007-properties-file-encoding.md +++ b/rfcs/0007-properties-file-encoding.md @@ -84,18 +84,16 @@ A static function (`nonAsciiEscaper.getEncodingFromAlias`) should be added to ma ### Task -A new task `escapeNonAsciiCharacters` should be added which receives the source encoding and a pattern to glob files within the workspace. +A task `escapeNonAsciiCharacters` should be added which receives the source encoding and a pattern to glob files within the workspace. It uses the `nonAsciiEscaper.getEncodingFromAlias` function to map the source encoding (based on the project configuration values) to a valid encoding for the processor. ### Types -The task operates on `*.properties` files using the processor. -The task should run first (before `replaceCopyright`) for all types. +Both `application` and `library` types should be enhanced. -This ensures that the properties files can always be consumed independent of the source encoding. -Each build output should contain the escaped properties files. +The formatters should take the [configuration](#Configuration) into account and apply the default. -The new standard task should automatically be integrated into the build process and the underlaying processor can be reused by the `ui5-server`. +The builders should execute the `escapeNonAsciiCharacters` task as the very first task and pass the propertiesFileSourceEncoding from the project configuration. The pattern should include all properties files (`/**/*.properties`). ### Server From 9a56e354c1f3248634af60f033220947342f14eb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matthias=20O=C3=9Fwald?= <1410947+matz3@users.noreply.github.com> Date: Tue, 30 Jul 2019 10:34:46 +0200 Subject: [PATCH 15/15] Fix typos --- rfcs/0007-properties-file-encoding.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rfcs/0007-properties-file-encoding.md b/rfcs/0007-properties-file-encoding.md index 3fa9d49ef6..4f412addcc 100644 --- a/rfcs/0007-properties-file-encoding.md +++ b/rfcs/0007-properties-file-encoding.md @@ -39,9 +39,9 @@ But as there are existing tools and server middleware which explicitly expect th A configuration for the `ui5-builder` tasks and the `ui5-server` should be provided such that the source encoding of `*.properties` files can be specified. This configuration is set on project level, so that multiple projects with different encodings function idependently. -Resources create as part of the project contain a reference to the project, which allows to read the expected source encoding for a single resource. +Resources created as part of the project contain a reference to the project, which allows to read the expected source encoding for a single resource. -Altought there are lots of different encodings, the configuration on projects only foresees the two relevant encodings. +Altought there are lots of different encodings, the configuration of projects only foresees two relevant encodings. Supported values are: `UTF-8` and `ISO-8859-1`