Skip to content

Commit

Permalink
Restore support for improperly encoded strings
Browse files Browse the repository at this point in the history
Since the `json` gem was initially written for Ruby 1.8
before strings had encoding, it used to do its own UTF-8
validation direction on bytes and never really considered
the string declared encoding. So passing a ASCII-8BIT string
worked as long as the bytes were valid UTF-8

We may want to deprecate this, but we should emit warnings first.
  • Loading branch information
byroot committed Oct 14, 2024
1 parent 513ddea commit 210a6e7
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 2 deletions.
38 changes: 36 additions & 2 deletions lib/json/pure/generator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -337,9 +337,20 @@ def generate(obj)
# Assumes !@ascii_only, !@script_safe
if Regexp.method_defined?(:match?)
private def fast_serialize_string(string, buf) # :nodoc:
buf << '"'.freeze
string = string.encode(::Encoding::UTF_8) unless string.encoding == ::Encoding::UTF_8
if string.encoding == ::Encoding::UTF_8
unless string.valid_encoding?
raise GeneratorError, "source sequence is illegal/malformed utf-8"
end
else
utf8_string = string.dup.force_encoding(::Encoding::UTF_8)
string = if utf8_string.valid_encoding?
utf8_string
else
string.encode(::Encoding::UTF_8)
end
end

buf << '"'.freeze
if /["\\\x0-\x1f]/n.match?(string)
buf << string.gsub(/["\\\x0-\x1f]/n, MAP)
else
Expand All @@ -350,6 +361,19 @@ def generate(obj)
else
# Ruby 2.3 compatibility
private def fast_serialize_string(string, buf) # :nodoc:
if string.encoding == ::Encoding::UTF_8
unless string.valid_encoding?
raise GeneratorError, "source sequence is illegal/malformed utf-8"
end
else
utf8_string = string.dup.force_encoding(::Encoding::UTF_8)
string = if utf8_string.valid_encoding?
utf8_string
else
string.encode(::Encoding::UTF_8)
end
end

buf << string.to_json(self)
end
end
Expand Down Expand Up @@ -515,6 +539,16 @@ def to_json(state = nil, *args)
end
string = self
else
# Since the `json` gem was initially written for Ruby 1.8
# before strings had encoding, it used to do its own UTF-8
# validation direction on bytes and never really considered
# the string declared encoding. So passing a ASCII-8BIT string
# worked as long as the bytes were valid UTF-8
# We may want to deprecate this, but we should emit warnings first.
utf8_string = dup.force_encoding(::Encoding::UTF_8)
if utf8_string.valid_encoding?
return utf8_string.to_json(state, *args)
end
string = encode(::Encoding::UTF_8)
end
if state.ascii_only?
Expand Down
8 changes: 8 additions & 0 deletions tests/json_generator_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -445,6 +445,14 @@ def test_invalid_encoding_string
assert_includes error.message, "source sequence is illegal/malformed utf-8"
end

def test_valid_utf8_in_different_encoding
utf8_string = "€™"
wrong_encoding_string = utf8_string.b
# This behavior is historical. Not necessary desirable.
assert_equal utf8_string.to_json, wrong_encoding_string.to_json
assert_equal JSON.dump(utf8_string), JSON.dump(wrong_encoding_string)
end

if defined?(JSON::Ext::Generator) and RUBY_PLATFORM != "java"
def test_string_ext_included_calls_super
included = false
Expand Down

0 comments on commit 210a6e7

Please sign in to comment.