Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RESTEasy adds wrong charset to response content type #25295

Closed
sithmein opened this issue May 2, 2022 · 7 comments
Closed

RESTEasy adds wrong charset to response content type #25295

sithmein opened this issue May 2, 2022 · 7 comments
Labels
kind/bug Something isn't working
Milestone

Comments

@sithmein
Copy link

sithmein commented May 2, 2022

Describe the bug

Consider the following code:

@Path("/")
public class CharsetDuplicated {
	@Provider
	@Produces(MediaType.APPLICATION_XML)
	public static class GenericXMLSerializer implements MessageBodyWriter<Object> {
	    @Override
	    public long getSize(final Object value, final Class<?> type, final Type genericType,
	        final Annotation [] annotations, final MediaType mediaType) {
	        return -1;
	    }

	    @Override
	    public boolean isWriteable(final Class<?> type, final Type genericType, final Annotation [] annotations,
	        final MediaType mediaType) {
	        return true;
	    }

	    @Override
	    public void writeTo(final Object value, final Class<?> type, final Type genericType,
	        final Annotation[] annotations, final MediaType mediaType,
	        final MultivaluedMap<String, Object> httpHeaders, final OutputStream entityStream)
	        throws IOException {
	        
    		httpHeaders.add(HttpHeaders.CONTENT_TYPE, mediaType + ";charset=ISO-8859-1");
		entityStream.write("<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?><foo>äöü</foo>"
					.getBytes(StandardCharsets.ISO_8859_1));
	    }
	}

    @POST
    @Path("charset-duplicated")
    @Consumes(MediaType.APPLICATION_JSON)
    @Produces({MediaType.APPLICATION_XML, MediaType.APPLICATION_JSON})
	public Response consumeJson() {
    	var map = Map.of("key", "value");
	    return Response.ok(map).build();
	}

    @POST
    @Path("charset-duplicated")
    @Consumes(MediaType.MULTIPART_FORM_DATA)
    @Produces({MediaType.APPLICATION_XML, MediaType.APPLICATION_JSON})
	public Response consumeMultipart() {
    	var map = Map.of("key", "value");
	    return Response.ok(map).build();
	}
}

In contrast to #25263 this time a media type is passed into the message body writer, but it contains charset=UTF-8. Which can be wrong if the message body writer creates output using a different charset. If the message body writer sets the charset explicitly the content type contains two conflicting charset specifications.

Expected behavior

The media type passed into the message body writer must not contain a charset because only the writer can tell what charset it produces.

Actual behavior

A UTF-8 charset is incorrectly assumed for every response.

How to Reproduce?

Run quarkus.bugs.CharsetDuplicatedTest from https://github.com/sithmein/bugs-in-quarkus.

Output of uname -a or ver

No response

Output of java -version

17

GraalVM version (if different from Java)

No response

Quarkus version or git rev

2.8.2.Final

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

@sithmein sithmein added the kind/bug Something isn't working label May 2, 2022
@sithmein
Copy link
Author

sithmein commented May 2, 2022

In our code this bug only started occurring when we switched to reactive RESTEasy. However, in the minimal reproducer it is also broken for classic RESTEasy.

@geoand
Copy link
Contributor

geoand commented May 3, 2022

I don't consider this a bug TBH.

If you need to set the content type, you can certainly do something like:

httpHeaders.putSingle(HttpHeaders.CONTENT_TYPE, mediaType.getType() + "/" + mediaType.getSubtype() + ";charset=ISO-8859-1");

@geoand geoand closed this as completed May 3, 2022
@geoand geoand added the triage/invalid This doesn't seem right label May 3, 2022
@sithmein
Copy link
Author

sithmein commented May 3, 2022

Please have a careful look at my example. Even if I don't explicitly set the content type it will be wrong! The message body writer uses ISO-8859-1 as encoding but still the HTTP Content-Type header claims it's UTF-8 which is simply wrong. It must not be the responsibility of the message body writer to fix a wrong charset. Why do you assume that every message body writer will create UTF-8 in the first place?

@sithmein
Copy link
Author

sithmein commented May 4, 2022

I reworked my example so it does not use a message body writer any more and produces images. Still the returned content type has a charset=UTF-8 which doesn't make any sense for non-textual data. See quarkus.bugs.UnexpectedCharsetTest in https://github.com/sithmein/bugs-in-quarkus.
I admit this is really an edge case but it is still wrong behaviour.

geoand added a commit to geoand/quarkus that referenced this issue May 5, 2022
…sy Reactive

The idea is that we know only append it for specific response media types.
Before this change, the charset was added depending on how the return type
of the Resource method

Relates to: quarkusio#25295
geoand added a commit to geoand/quarkus that referenced this issue May 5, 2022
…sy Reactive

The idea is that we know only append it for specific response media types.
Before this change, the charset was added depending on how the return type
of the Resource method

Relates to: quarkusio#25295
@geoand
Copy link
Contributor

geoand commented May 5, 2022

Understood, thanks

geoand added a commit to geoand/quarkus that referenced this issue May 10, 2022
…sy Reactive

The idea is that we know only append it for specific response media types.
Before this change, the charset was added depending on how the return type
of the Resource method

Relates to: quarkusio#25295
geoand added a commit to geoand/quarkus that referenced this issue May 16, 2022
…sy Reactive

The idea is that we know only append it for specific response media types.
Before this change, the charset was added depending on how the return type
of the Resource method

Relates to: quarkusio#25295
geoand added a commit to geoand/quarkus that referenced this issue May 16, 2022
…sy Reactive

The idea is that we know only append it for specific response media types.
Before this change, the charset was added depending on how the return type
of the Resource method

Relates to: quarkusio#25295
geoand added a commit to geoand/quarkus that referenced this issue May 16, 2022
…sy Reactive

The idea is that we know only append it for specific response media types.
Before this change, the charset was added depending on how the return type
of the Resource method

Relates to: quarkusio#25295
geoand added a commit to geoand/quarkus that referenced this issue May 16, 2022
…sy Reactive

The idea is that we know only append it for specific response media types.
Before this change, the charset was added depending on how the return type
of the Resource method

Relates to: quarkusio#25295
geoand added a commit to geoand/quarkus that referenced this issue May 16, 2022
…sy Reactive

The idea is that we know only append it for specific response media types.
Before this change, the charset was added depending on how the return type
of the Resource method

Relates to: quarkusio#25295
gsmet added a commit that referenced this issue May 16, 2022
Harmonize the usage of charset in the Content-Type headers for RESTEasy Reactive
gsmet pushed a commit to gsmet/quarkus that referenced this issue May 16, 2022
…sy Reactive

The idea is that we know only append it for specific response media types.
Before this change, the charset was added depending on how the return type
of the Resource method

Relates to: quarkusio#25295

(cherry picked from commit 60aa2b7)
@sithmein
Copy link
Author

sithmein commented Jun 30, 2022

According to quarkus-qe/quarkus-test-suite#655 this issue has been fixed but it's still labelled with "invalid". If it's indeed fixed maybe someone can remove that label (and add a milestone).

@geoand geoand removed the triage/invalid This doesn't seem right label Jun 30, 2022
@geoand geoand added this to the 2.9.1.Final milestone Jun 30, 2022
@geoand
Copy link
Contributor

geoand commented Jun 30, 2022

Good point, done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants