Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 encoding problem with MultiPart/RestEasy #10323

Closed
Kondamon opened this issue Jun 27, 2020 · 19 comments · Fixed by #11893
Closed

UTF-8 encoding problem with MultiPart/RestEasy #10323

Kondamon opened this issue Jun 27, 2020 · 19 comments · Fixed by #11893
Assignees
Labels
kind/question Further information is requested
Milestone

Comments

@Kondamon
Copy link

Describe the bug
I develop a rest API with MULTIPART_FORM. But I'm having problems with @consumes charset. I sent some params to the controller which includes German characters. And it didn't work as expected. The characters are not encoded properly, what can also be seen in debugging the properties in VSCode.

Some of the characters that cause the problem:

  • ä, Ä, ü, Ü, ö, Ö, ß

Instead, I receive:

  • ��

Expected behavior

  • Receive all Umlaut, etc. UTF-8

To Reproduce
Using httpie:

http -f POST localhost:8080/general content="Test-Ä" [email protected]

Response:

HTTP/1.1 200 OK
Content-Length: 11
Content-Type: text/plain;charset=UTF-8

Test-��

Ressource:

import org.jboss.resteasy.annotations.providers.multipart.MultipartForm;
import javax.ws.rs.Consumes;
import javax.ws.rs.POST;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import javax.ws.rs.core.MediaType;

public class FeedbackResource {

    @POST
    @Path("/general")
    @Produces(MediaType.TEXT_PLAIN)
    @Consumes(MediaType.MULTIPART_FORM_DATA+";charset=UTF-8")
    public String postForm(@MultipartForm final FeedbackBody feedback) {
        return feedback.content;
    }
}

Model:

package org.acme;

import org.jboss.resteasy.annotations.providers.multipart.PartType;

import javax.ws.rs.FormParam;
import javax.ws.rs.core.MediaType;

public class FeedbackBody {

    private byte[] file;
    public byte[] getFile() {
        return file;
    }

    @FormParam("file")
    @PartType(MediaType.APPLICATION_OCTET_STREAM)
    public void setFile(byte[] file) {
        this.file = file;
    }

    @FormParam("fileName")
    @PartType(MediaType.TEXT_PLAIN)
    public String fileName;

    @FormParam("content")
    @PartType(MediaType.TEXT_PLAIN+";charset=UTF-8")
    public String content;
}

Configuration

# Add your application.properties here, if applicable.
<properties>
    <compiler-plugin.version>3.8.1</compiler-plugin.version>
    <maven.compiler.parameters>true</maven.compiler.parameters>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
    <quarkus-plugin.version>1.5.1.Final</quarkus-plugin.version>
    <quarkus.platform.artifact-id>quarkus-universe-bom</quarkus.platform.artifact-id>
    <quarkus.platform.group-id>io.quarkus</quarkus.platform.group-id>
    <quarkus.platform.version>1.5.1.Final</quarkus.platform.version>
    <surefire-plugin.version>2.22.1</surefire-plugin.version>
  </properties>

Environment (please complete the following information):

  • Output of uname -a or ver:
    Darwin Mac.local 19.5.0 Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64 x86_64
  • Output of java -version:
    openjdk version "11.0.6" 2020-01-14
    OpenJDK Runtime Environment GraalVM CE 19.3.1 (build 11.0.6+9-jvmci-19.3-b07)
    OpenJDK 64-Bit Server VM GraalVM CE 19.3.1 (build 11.0.6+9-jvmci-19.3-b07, mixed mode, sharing)
  • Quarkus version or git rev:
    1.5.1.Final
  • Build tool (ie. output of mvnw --version or gradlew --version):
    Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
    Maven home: /Users/.m2/wrapper/dists/apache-maven-3.6.3-bin/1iopthnavndlasol9gbrbg6bf2/apache-maven-3.6.3
    Java version: 11.0.6, vendor: Oracle Corporation, runtime: /Users/christian/.sdkman/candidates/java/19.3.1.r11-grl
    Default locale: en_GB, platform encoding: UTF-8
    OS name: "mac os x", version: "10.15.5", arch: "x86_64", family: "mac"
@Kondamon Kondamon added the kind/bug Something isn't working label Jun 27, 2020
@ejba
Copy link
Contributor

ejba commented Jun 28, 2020

Hi @Kondamon,

I created a reproducer according to your specifications and you are right about the behavior. Nevertheless, the HTTPie does not specify the Content-Type for the content field. According to Resteasy's documentation when the content's type is not specified it will use the default, us-ascii. Using the debugger, it seems to be the origin's problem (but does not exclude anyone see it by yourself). However, I tried different ways to override this default but without success.

Is it possible to enable the resteasy.add.charset? It would solve the problem.

Oops, it's enabled by default.

@ejba
Copy link
Contributor

ejba commented Jun 29, 2020

Including screenshots to understand better what I am trying to explain.

extract-part

add-part-list

@Kondamon
Copy link
Author

Hi @ejba! Thank you for your effort! I have tried to use Postman to send the form with these settings but without success. Still the same behavior. Here the content-type is provided as UTF-8.

Screenshot 2020-06-29 at 21 12 47

@ejba
Copy link
Contributor

ejba commented Jun 29, 2020

I tried again but with curl command instead. Successfully obtained the result that you expected.

~ % curl -v -X POST -F "content=Test-Ä;type=text/plain;charset=utf-8" "http://localhost:8080/hello"
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> POST /hello HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Length: 189
> Content-Type: multipart/form-data; boundary=------------------------bed3af2091f63039
> 
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< Content-Length: 7
< Content-Type: text/plain;charset=UTF-8
< 
* Connection #0 to host localhost left intact
Test-Ä* Closing connection 0

So it means this is not a bug, it's just a matter of how to specify the content's type.

@Kondamon
Copy link
Author

Kondamon commented Jun 30, 2020

Thank you very much! It seems like only curl supports to specify the content type. I have added the mime-type="type=text/plain;charset=utf-8" in the form upload and everything works fine now!

@ghost
Copy link

ghost commented Jun 30, 2020

Thank you very much! It seems like only curl supports to specify the content type. I have added the mime-type="type=text/plain;charset=utf-8" in the form upload and everything works fine now!

I can't see how you've solved this issue. öööööaaa always returns �����aaa with multipart.

@Path("/multi")
@Consumes(MediaType.MULTIPART_FORM_DATA)
public class MultiResource {

    @POST
    @Produces(MediaType.TEXT_PLAIN)
    public String message(@MultipartForm MultiBody multiBody) {
        return multiBody.message;
    }
}
public class MultiBody {

    @FormParam("message")
    @PartType(MediaType.TEXT_PLAIN)
    public String message;
}
POST /multi HTTP/1.1
Host: localhost:8080
Accept: */*
Accept-Language: de,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Content-Type: multipart/form-data; boundary=---------------------------197791880114771219241910375257
Content-Length: 186
DNT: 1
Connection: keep-alive
Accept-Charset: utf-8
Pragma: no-cache
Cache-Control: no-cache

Response:

Content-Length: 53
Content-Type: text/plain;charset=UTF-8
����������������AAAAA

@ejba
Copy link
Contributor

ejba commented Jul 1, 2020

@batraz90 Could you show us how are you performing the request?

@ghost
Copy link

ghost commented Jul 1, 2020

@batraz90 Could you show us how are you performing the request?

Actually it's a simple HTML form:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
</head>
<body>
    <form action="http://localhost:8080/multi" enctype="multipart/form-data" method="POST">
        <input type="text" name="message">
        <button type="submit"></button>
    </form>
</body>
</html>

but I haven't been able to make it work with Curl or any other Rest Client either.

@Kondamon
Copy link
Author

Kondamon commented Jul 1, 2020

I didn't use a html form. But you need to specify the content-type of your message input. I guess, it could be done with this code <form action="http://localhost:8080/multi" enctype="multipart/form-data" method="POST" accept-charset="utf-8">.

@ghost
Copy link

ghost commented Jul 1, 2020

I didn't use a html form. But you need to specify the content-type of your message input. I guess, it could be done with this code <form action="http://localhost:8080/multi" enctype="multipart/form-data" method="POST" accept-charset="utf-8">.

I've tried that. Doesn't work.

@ejba
Copy link
Contributor

ejba commented Jul 1, 2020

I tried two ways to specify a charset in a HTML form and none worked out. Here's a reproducer.

@Kondamon
Copy link
Author

Kondamon commented Jul 1, 2020

I have used a framework that uses RFC2388 and RFC2045. Each bodyPart of the MultipartForm has it's own header. For setting UTF-8 just for the content field from above it has to look like this (see bodyParts[2]):

Screenshot 2020-07-01 at 23 01 12

@ghost
Copy link

ghost commented Jul 1, 2020

Which means it doesn't work with a regular multipart/form-data form ?

@ejba
Copy link
Contributor

ejba commented Jul 2, 2020

@batraz90 the problem is the browser don't set the content-type for each body part like @Kondamon did with Alamofire lib. There's a thread in the quarkus-dev ML to discuss the possibility to allow configure default charset when not specify or have a better default charset as UTF-8.

@gsmet gsmet added kind/question Further information is requested and removed kind/bug Something isn't working labels Jul 2, 2020
@evialle
Copy link

evialle commented Aug 22, 2020

I solved that issue by adding an Interceptor in my code

package fr.vstudios.leclick.front;

import org.jboss.resteasy.plugins.providers.multipart.InputPart;

import javax.ws.rs.container.ContainerRequestContext;
import javax.ws.rs.container.ContainerRequestFilter;
import javax.ws.rs.ext.Provider;


@Provider
public class CharsetInterceptorFilter implements ContainerRequestFilter {

    @Override
    public void filter(ContainerRequestContext context) {
        context.setProperty(InputPart.DEFAULT_CHARSET_PROPERTY, "UTF-8");
    }
}

@ejba
Copy link
Contributor

ejba commented Aug 22, 2020

I was trying to do that but in the vertx extension by instance. My idea was to allow configure the resteasy DEFAULT_CHARSET_PROPERTY property through a quarkus config property instead, avoiding create a filter for the purpose.

@gsmet Do you think it is possible doing that?

@gsmet
Copy link
Member

gsmet commented Aug 22, 2020

I think we could hardcode it to UTF-8 and see if people want to make it configurable. UTF-8 is certainly a better default than the current situation.

@gsmet gsmet self-assigned this Aug 24, 2020
@gsmet gsmet added this to the 1.8.0 - master milestone Aug 24, 2020
@gsmet gsmet reopened this Aug 24, 2020
@gsmet gsmet modified the milestones: 1.8.0.CR1, 1.9.0 - master Sep 2, 2020
@gsmet
Copy link
Member

gsmet commented Sep 14, 2020

We just merged to master a new quarkus-resteasy-multipart extension that fixes this issue. You will just need to use it instead of the provider dependency.

Default charset will be UTF-8 but you can tweak it if needed.

I will backport this to 1.8.1.Final, that I should release on September 30th.

@gsmet gsmet removed this from the 1.9.0 - master milestone Sep 15, 2020
@danielFesenmeyer
Copy link

Seems to be still an issue with version 2.15.0.Final. Had to use the CharsetInterceptorFilter mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants