`v2.7.3` Returns `u2029` or `u2028` from `JSON.generate` with `script_safe` (`escape_slash`) set to `true` for some UTF-8 characters #715

nvasilevski · 2024-12-03T00:41:54Z

Affected characters: ["ဨ", "ဩ", "〨", "〩", "䀨", "䀩", "倨", "倩", "怨", "怩", "瀨", "瀩", "耨", "耩", "逨", "逩", "ꀨ", "ꀩ", "뀨", "뀩", "쀨", "쀩", "퀨", "퀩", "", "", "", ""]

Expected behavior (<2.7.3)

::JSON.generate({values:["倩", "瀨"]}, script_safe: true)
=> "{\"values\":[\"倩\",\"瀨\"]}"

Actual behavior (>=2.7.3)

::JSON.generate({values:["倩", "瀨"]}, script_safe: true)
=> "{\"values\":[\"\\u2029\",\"\\u2028\"]}"

Most likely the cause

https://github.com/ruby/json/pull/629/files#diff-2bb51be932dec14923f6eb515f24b1b593737f0d3f8e76eeecf58cff3052819fR74-R85

Context

We seem to be unintentionally falling into the branch:

                case 3: {
                    unsigned char b2 = ptr[pos + 1];
                    if (RB_UNLIKELY(out_script_safe && b2 == 0x80)) {
                        unsigned char b3 = ptr[pos + 2];
                        if (b3 == 0xA8) {
                            FLUSH_POS(3);
                            fbuffer_append(out_buffer, "\\u2028", 6);
                            break;
                        } else if (b3 == 0xA9) {
                            FLUSH_POS(3);
                            fbuffer_append(out_buffer, "\\u2029", 6);
                            break;

Because

The length of both 倩 and 瀨 characters is 3: ["倩", "瀨"].map(&:bytesize) => [3, 3]
Second byte happens to be 0x80: ["倩", "瀨"].map { _1.bytes[1].to_s(16) } => ["80", "80"]
Third bytes happen to be 0xA9 and 0xA8: ["倩", "瀨"].map { _1.bytes[2].to_s(16) } => ["a9", "a8"]

Possible solution:

Should the condition include check of the first byte to be equal 0xE2? ["\u2028", "\u2029"].map { _1.bytes.first.to_s(16) } => ["e2", "e2"]
Something like

if (RB_UNLIKELY(out_script_safe && ch == 0xE2 && b2 == 0x80)) {

I'll look into proposing a PR but I don't mind if someone else is eager to propose a fix

The text was updated successfully, but these errors were encountered:

byroot · 2024-12-03T08:25:32Z

Thanks :/

I released 2.9.0 with this fix.

nvasilevski · 2024-12-03T14:11:10Z

Thanks! For better discoverability I changed the title and here is the list of other characters that should experience the same bug due to matching second & third bytes

["ဨ", "ဩ", "〨", "〩", "䀨", "䀩", "倨", "倩", "怨", "怩", "瀨", "瀩", "耨", "耩", "逨", "逩", "ꀨ", "ꀩ", "뀨", "뀩", "쀨", "쀩", "퀨", "퀩", "", "", "", ""]

…characters Fix: ruby/json#715 The first byte check was missing. ruby/json@93a7f8717d

byroot mentioned this issue Dec 3, 2024

Fix generate(script_safe: true) to not confuse unrelated characters #716

Merged

byroot closed this as completed in 93a7f87 Dec 3, 2024

byroot closed this as completed in #716 Dec 3, 2024

byroot added a commit to byroot/ruby that referenced this issue Dec 5, 2024

[ruby/json] Fix generate(script_safe: true) to not confuse unrelated …

f9a1def

…characters Fix: ruby/json#715 The first byte check was missing. ruby/json@93a7f8717d

byroot added a commit to ruby/ruby that referenced this issue Dec 5, 2024

[ruby/json] Fix generate(script_safe: true) to not confuse unrelated …

1510d72

…characters Fix: ruby/json#715 The first byte check was missing. ruby/json@93a7f8717d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`v2.7.3` Returns `u2029` or `u2028` from `JSON.generate` with `script_safe` (`escape_slash`) set to `true` for some UTF-8 characters #715

`v2.7.3` Returns `u2029` or `u2028` from `JSON.generate` with `script_safe` (`escape_slash`) set to `true` for some UTF-8 characters #715

nvasilevski commented Dec 3, 2024 •

edited

Loading

byroot commented Dec 3, 2024

nvasilevski commented Dec 3, 2024

v2.7.3 Returns u2029 or u2028 from JSON.generate with script_safe (escape_slash) set to true for some UTF-8 characters #715

v2.7.3 Returns u2029 or u2028 from JSON.generate with script_safe (escape_slash) set to true for some UTF-8 characters #715

Comments

nvasilevski commented Dec 3, 2024 • edited Loading

Expected behavior (<2.7.3)

Actual behavior (>=2.7.3)

Most likely the cause

Context

Possible solution:

byroot commented Dec 3, 2024

nvasilevski commented Dec 3, 2024

`v2.7.3` Returns `u2029` or `u2028` from `JSON.generate` with `script_safe` (`escape_slash`) set to `true` for some UTF-8 characters #715

`v2.7.3` Returns `u2029` or `u2028` from `JSON.generate` with `script_safe` (`escape_slash`) set to `true` for some UTF-8 characters #715

nvasilevski commented Dec 3, 2024 •

edited

Loading