Skip to content

Commit

Permalink
Only test the wrongly encoded string behavior on the C version
Browse files Browse the repository at this point in the history
Both the pure and java version already raise an error on such case,
so this confirms that we're rather deprecate and fix the C version.

We shouldn't make the pure or java versions accept these broken
strings.
  • Loading branch information
byroot committed Oct 14, 2024
1 parent 210a6e7 commit c5a6d80
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 44 deletions.
38 changes: 2 additions & 36 deletions lib/json/pure/generator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -337,20 +337,9 @@ def generate(obj)
# Assumes !@ascii_only, !@script_safe
if Regexp.method_defined?(:match?)
private def fast_serialize_string(string, buf) # :nodoc:
if string.encoding == ::Encoding::UTF_8
unless string.valid_encoding?
raise GeneratorError, "source sequence is illegal/malformed utf-8"
end
else
utf8_string = string.dup.force_encoding(::Encoding::UTF_8)
string = if utf8_string.valid_encoding?
utf8_string
else
string.encode(::Encoding::UTF_8)
end
end

buf << '"'.freeze
string = string.encode(::Encoding::UTF_8) unless string.encoding == ::Encoding::UTF_8

if /["\\\x0-\x1f]/n.match?(string)
buf << string.gsub(/["\\\x0-\x1f]/n, MAP)
else
Expand All @@ -361,19 +350,6 @@ def generate(obj)
else
# Ruby 2.3 compatibility
private def fast_serialize_string(string, buf) # :nodoc:
if string.encoding == ::Encoding::UTF_8
unless string.valid_encoding?
raise GeneratorError, "source sequence is illegal/malformed utf-8"
end
else
utf8_string = string.dup.force_encoding(::Encoding::UTF_8)
string = if utf8_string.valid_encoding?
utf8_string
else
string.encode(::Encoding::UTF_8)
end
end

buf << string.to_json(self)
end
end
Expand Down Expand Up @@ -539,16 +515,6 @@ def to_json(state = nil, *args)
end
string = self
else
# Since the `json` gem was initially written for Ruby 1.8
# before strings had encoding, it used to do its own UTF-8
# validation direction on bytes and never really considered
# the string declared encoding. So passing a ASCII-8BIT string
# worked as long as the bytes were valid UTF-8
# We may want to deprecate this, but we should emit warnings first.
utf8_string = dup.force_encoding(::Encoding::UTF_8)
if utf8_string.valid_encoding?
return utf8_string.to_json(state, *args)
end
string = encode(::Encoding::UTF_8)
end
if state.ascii_only?
Expand Down
17 changes: 9 additions & 8 deletions tests/json_generator_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -445,15 +445,16 @@ def test_invalid_encoding_string
assert_includes error.message, "source sequence is illegal/malformed utf-8"
end

def test_valid_utf8_in_different_encoding
utf8_string = "€™"
wrong_encoding_string = utf8_string.b
# This behavior is historical. Not necessary desirable.
assert_equal utf8_string.to_json, wrong_encoding_string.to_json
assert_equal JSON.dump(utf8_string), JSON.dump(wrong_encoding_string)
end

if defined?(JSON::Ext::Generator) and RUBY_PLATFORM != "java"
def test_valid_utf8_in_different_encoding
utf8_string = "€™"
wrong_encoding_string = utf8_string.b
# This behavior is historical. Not necessary desirable. We should deprecated it.
# The pure and java version of the gem already don't behave this way.
assert_equal utf8_string.to_json, wrong_encoding_string.to_json
assert_equal JSON.dump(utf8_string), JSON.dump(wrong_encoding_string)
end

def test_string_ext_included_calls_super
included = false

Expand Down

0 comments on commit c5a6d80

Please sign in to comment.