Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RESTEasy Reactive shouldn't forcefully append a charset attribute when media-type is application/json #43998

Open
slandelle opened this issue Oct 21, 2024 · 9 comments
Labels
area/rest kind/bug Something isn't working

Comments

@slandelle
Copy link

Describe the bug

When computing a content-type header, RESTEasy Reactive shouldn't forcefully append a charset attribute to all text based media-types, including application/json.

It was not originally the case, this change was introduced by this commit: 60aa2b7

As per the IANA spec for application/json:

Note: No "charset" parameter is defined for this registration.
Adding one really has no effect on compliant recipients.

This can be interpreted in several ways I guess. I interpret it as:

  • Adding a charset attribute is incorrect
  • Recipients are expected to be lenient but that doesn't make it any less incorrect

The reason behind this is that JSON requires either UTF-8, UTF-16 or UTF-32 encoding and parsers are expected to detect the correct encoding based on the first bytes.

Expected behavior

content-type: application/json

Actual behavior

content-type: application/json;charset=UTF-8

How to Reproduce?

https://github.com/quarkusio/quarkus/blob/3.15.1/independent-projects/resteasy-reactive/server/vertx/src/test/java/org/jboss/resteasy/reactive/server/vertx/test/mediatype/CharsetTest.java#L56

Output of uname -a or ver

irrelevant

Output of java -version

irrelevant

Quarkus version or git rev

3.15.1

Build tool (ie. output of mvnw --version or gradlew --version)

irrelevant

Additional information

Note: this broke our Keycloak client when we upgraded because it was strict. We fix it and made it lenient but I still think this is also a server side issue and charset shouldn't be there.

@slandelle slandelle added the kind/bug Something isn't working label Oct 21, 2024
Copy link

quarkus-bot bot commented Oct 21, 2024

/cc @FroMage (rest), @geoand (rest), @stuartwdouglas (rest)

@FroMage
Copy link
Member

FroMage commented Oct 22, 2024

Perhaps it should not forcibly add that charset indeed, but is it causing issues?

Interestingly https://www.ietf.org/rfc/rfc4627.html#section-3 says it can indeed only be unicode, I never realised that. Good to know. Good that we enforce it to be utf8 (regardless of whether or not we set the charset).

Also interestingly, it could be utf8 or utf-16 or utf-32, in which case we need an extra https://www.w3.org/Protocols/rfc1341/5_Content-Transfer-Encoding.html header set to binary as explained at https://www.ietf.org/rfc/rfc4627.html#section-6. Good thing we don't support that 😅

@slandelle
Copy link
Author

Hey @FroMage

Perhaps it should not forcibly add that charset indeed

That's my opinion.

but is it causing issues?

It makes Quarkus not compatible with code like if (content-type.equals("application/json")).
In our case, it broke our Keycloak client while upgrading from a 1 year old version (our bad, we made our client lenient).

So you're left with 2 choices for the producer side:

  1. application/json;charset=UTF-8: you're favoring broken recipients that expect charset to be present, hence the incorrect value
  2. application/json: you're favoring broken recipients that expect charset to NOT be present, hence the correct value

IMO, even if the second type of recipients are not correct as per the spec, I expect them to be way more common because application/json is both the correct value and the most frequent form.

@slandelle
Copy link
Author

... or utf-16 or utf-32, in which case we need an extra https://www.w3.org/Protocols/rfc1341/5_Content-Transfer-Encoding.html header set to binary as explained at https://www.ietf.org/rfc/rfc4627.html#section-6.

Oh, I wasn't aware of that. And I fail to see the rationale as the encoding can be inferred from the first 4 bytes, as described here: https://www.ietf.org/rfc/rfc4627.html#section-3. You can check what Jackson does: you don't need to pass a Charset when parsing in InputStream, it just infers it. 🤷

@FroMage
Copy link
Member

FroMage commented Oct 22, 2024

Seems OK to me to not add a charset for json, given all this. Depending on whether @geoand agrees.

@geoand
Copy link
Contributor

geoand commented Oct 23, 2024

We originally added it because of #25295

@slandelle
Copy link
Author

@geoand Indeed. The thing is that you can't handle all media-types the same way. What we did in Gatling is treating application/json as a special case. Better good than perfect and cover obscure cases.

@geoand
Copy link
Contributor

geoand commented Oct 23, 2024

Right, that does seem to make sense and I actually think were doing something similar, I see:

if (MediaTypeHelper.isTextLike(mediaType)) {
            effectiveCharset = originalCharset;
            if (effectiveCharset == null) {
                effectiveCharset = StandardCharsets.UTF_8.name();
            }
        } 

where MediaTypeHelper.isTextLike(mediaType) is:

    public static boolean isTextLike(MediaType mediaType) {
        String type = mediaType.getType();
        String subtype = mediaType.getSubtype();
        return (type.equals("application") && (subtype.contains("json") || subtype.contains("xml") || subtype.contains("yaml")))
                || type.equals("text");
    }

So I guess we just need to not use MediaTypeHelper.isTextLike(mediaType) but something like shouldAddCharset(MediaType) which would only apply for json.

@FroMage WDYT?

@FroMage
Copy link
Member

FroMage commented Oct 23, 2024

LGTM

geoand added a commit to geoand/quarkus that referenced this issue Oct 30, 2024
osandum added a commit to bruun-rasmussen/short-url that referenced this issue Feb 20, 2025
Using handheld Gson.toJson() here instead of relying on Jax-RS 
negotiation because of quarkusio/quarkus#43998 
(as Chrome JavaScript fetch() seems to be troubled by the non-compliant 
';charset=UTF-8' extension added to Content-Type: application/json by 
Quarkus)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/rest kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants