You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The length of both 倩 and 瀨 characters is 3: ["倩", "瀨"].map(&:bytesize) => [3, 3]
Second byte happens to be 0x80: ["倩", "瀨"].map { _1.bytes[1].to_s(16) } => ["80", "80"]
Third bytes happen to be 0xA9 and 0xA8: ["倩", "瀨"].map { _1.bytes[2].to_s(16) } => ["a9", "a8"]
Possible solution:
Should the condition include check of the first byte to be equal 0xE2? ["\u2028", "\u2029"].map { _1.bytes.first.to_s(16) } => ["e2", "e2"]
Something like
if (RB_UNLIKELY(out_script_safe&&ch==0xE2&&b2==0x80)) {
I'll look into proposing a PR but I don't mind if someone else is eager to propose a fix
The text was updated successfully, but these errors were encountered:
nvasilevski
changed the title
v2.7.3 Changes the behavior of JSON.generate with script_safe (escape_slash) set to true for some Japanese charactersv2.7.3 Returns u2029 or u2028 from JSON.generate with script_safe (escape_slash) set to true for some UTF-8 characters
Dec 3, 2024
Thanks! For better discoverability I changed the title and here is the list of other characters that should experience the same bug due to matching second & third bytes
Affected characters:
["ဨ", "ဩ", "〨", "〩", "䀨", "䀩", "倨", "倩", "怨", "怩", "瀨", "瀩", "耨", "耩", "逨", "逩", "ꀨ", "ꀩ", "뀨", "뀩", "쀨", "쀩", "퀨", "퀩", "", "", "", ""]
Expected behavior (<2.7.3)
Actual behavior (>=2.7.3)
Most likely the cause
https://github.com/ruby/json/pull/629/files#diff-2bb51be932dec14923f6eb515f24b1b593737f0d3f8e76eeecf58cff3052819fR74-R85
Context
We seem to be unintentionally falling into the branch:
Because
倩
and瀨
characters is3
:["倩", "瀨"].map(&:bytesize) => [3, 3]
0x80
:["倩", "瀨"].map { _1.bytes[1].to_s(16) } => ["80", "80"]
0xA9
and0xA8
:["倩", "瀨"].map { _1.bytes[2].to_s(16) } => ["a9", "a8"]
Possible solution:
Should the condition include check of the first byte to be equal
0xE2
?["\u2028", "\u2029"].map { _1.bytes.first.to_s(16) } => ["e2", "e2"]
Something like
I'll look into proposing a PR but I don't mind if someone else is eager to propose a fix
The text was updated successfully, but these errors were encountered: