Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating from quarkus 2.9.2.Final (Java 17) to 3.17.2 (Java 21) heap memory issues #45122

Open
TiagoFerreirahub opened this issue Dec 13, 2024 · 13 comments
Labels
area/rest kind/question Further information is requested

Comments

@TiagoFerreirahub
Copy link

Describe the bug

After the following changes:

  • Changing quarkus version from 2.9.2.Final to 3.17.2;
  • Changing dependency "quarkus-resteasy" to "quarkus-rest-jackson"

A rest client that was consuming a ~350MB json response stopped working and now throws java.lang.OutOfMemoryError: Java heap space.

Same heap memory configurations were used:

  • Xms100m
  • Xmx300m

In order for the new quarkus version and dependencies to work i needed to raise the Xmx to 2g.

I was not able to upload all the zip files, but i do have the whole setup for testing done, if needed i can send it.

Expected behavior

Client-works.zip
In version 3.17.2 of quarkus, with the new dependencies and java version, the json body should be able to be consumed with similar heap configurations. (Working client with quarkus version 2.9.2.Final on zip file in attachments)

Actual behavior

After changing the quarkus version to 3.17.2, java to version 21, and changing dependency "quarkus-resteasy" to "quarkus-rest-jackson", it is not possible to consume the same json body response with the same heap memory configurations. (Needed to raise Xmx from 300m to 2g)

How to Reproduce?

  1. Create a quarkus project with version 2.9.2.Final, java 17 and "quarkus-resteasy" and "quarkus-rest-client-jackson" dependencies. (Check zip file "Client-works.zip") and run it with Xms100m and Xmx300m.
  2. Create another quarkus client similar to the previous one but with version 3.17.2, java 21 and with "quarkus-rest" and "quarkus-rest-client-jackson" dependencies and run it with Xms100m and Xmx300m.
  3. Create a mock server that will return a ~350MB json response.
  4. Make a request to the server on one client and the other, the client with quarkus version 2.9.2.Final should be able to read the response and the client with the quarkus version 3.17.2 should take a lot of time to read it and eventually throw java.lang.OutOfMemoryError: Java heap space.

Output of uname -a or ver

No response

Output of java -version

No response

Quarkus version or git rev

No response

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

  • I ran all the tests on Docker containers with the specified heap configurations.
@TiagoFerreirahub TiagoFerreirahub added the kind/bug Something isn't working label Dec 13, 2024
@geoand
Copy link
Contributor

geoand commented Dec 16, 2024

Thanks a lot for reporting.

Changing dependency "quarkus-resteasy" to "quarkus-rest-jackson"

This is an equivalent change. The apples to apples comparison would be from quarkus-resteasy to quarkus-rest

After changing the quarkus version to 3.17.2, java to version 21, and changing dependency "quarkus-resteasy" to "quarkus-rest-jackson", it is not possible to consume the same json body response with the same heap memory configurations

This is a lot of changes. I propose doing changes in isolated steps and seeing what the impact is for each one

Copy link

quarkus-bot bot commented Dec 16, 2024

/cc @FroMage (rest), @stuartwdouglas (rest)

@geoand geoand added kind/question Further information is requested and removed kind/bug Something isn't working labels Dec 16, 2024
@TiagoFerreirahub
Copy link
Author

I got a chance to perform some isolated changes and :

  • Java version does not matter, i tested quarkus 3.17.2 with Java 17 and 21, both gave me the OOM issue described here.
  • I also tested using quarkus-rest instead of quarkus-rest-jackson to be as similar as possible, and it also did not help.
  • The only thing that seems to have an impact is the quarkus version. (2.9.2.Final works and 3.17.2 does not when processing a ~350MB json response

Maybe im missing something here, but i'm out of ideas right now for more tests that i could do on my side

@gsmet
Copy link
Member

gsmet commented Dec 17, 2024

The next step is to get a heap dump from the OOM and have a look at it with either VisualVM or Eclipse MAT.

Also bisecting the version might be useful as it's a huuuuge gap. I would try 2.16 then 3.0, then 3.8, then 3.15 and refine depending on the results.

I can help but setting up things can be a bit time consuming and I wouldn't like to shoot in the dark. So if you could provide an easy way to reproduce it with all pieces available, that would already be awesome.

I see you have it ready so if you can send me a link at <github handle>@redhat.com, I could have a closer look.

@geoand
Copy link
Contributor

geoand commented Dec 17, 2024

then 3.0, then 3.8,

I would add 3.2 here also

@TiagoFerreirahub
Copy link
Author

TiagoFerreirahub commented Dec 19, 2024

Hi there, i looked into the heap dump of the real app where this issue appeared first but i'm not very proficient in looking into heap dumps so i could not pinpoint the issue, would you be able to have a look into it?

@gsmet i sent you an email with both a mock server that generates the response and the client that asks for it that i'm using to test these scenarios.

I tested with java 17 only migrating from quarkus 2.9.2.Final -> 2.16 -> 3.0 -> 3.2 -> 3.8 and when i got to 3.15 it broke (after i changed dependency from quarkus-resteasy to quarkus-rest)

If you think it is necessary i can send you a heap dump, but i think you can generate that easily with the projects @gsmet.

Tell me if i can perform any more tests to take out some other options, i'm open to suggestions, and thank you for your help

@TiagoFerreirahub
Copy link
Author

Were you able to replicate it @gsmet? Is there any news on this topic?

@gsmet
Copy link
Member

gsmet commented Jan 10, 2025

Sorry, I completely missed the ping (and the email...). I'll try to have a look later today once I'm done with my current task!

@gsmet
Copy link
Member

gsmet commented Jan 10, 2025

I'll send you a follow up by email.

@gsmet
Copy link
Member

gsmet commented Jan 10, 2025

I was able to reproduce it and it's even worse as I end up having a gazillion of log lines with:

2025-01-10 16:13:51,330 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-10) Uncaught exception received by Vert.x: java.lang.OutOfMemoryError: Java heap space
2025-01-10 16:13:51,339 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-10) Uncaught exception received by Vert.x: java.lang.OutOfMemoryError: Java heap space
2025-01-10 16:13:51,348 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-10) Uncaught exception received by Vert.x: java.lang.OutOfMemoryError: Java heap space
2025-01-10 16:13:51,358 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-10) Uncaught exception received by Vert.x: java.lang.OutOfMemoryError: Java heap space
2025-01-10 16:13:51,367 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-10) Uncaught exception received by Vert.x: java.lang.OutOfMemoryError: Java heap space
2025-01-10 16:13:51,376 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-10) Uncaught exception received by Vert.x: java.lang.OutOfMemoryError: Java heap space
2025-01-10 16:13:51,385 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-10) Uncaught exception received by Vert.x: java.lang.OutOfMemoryError: Java heap space
2025-01-10 16:13:51,394 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-10) Uncaught exception received by Vert.x: java.lang.OutOfMemoryError: Java heap space
2025-01-10 16:13:51,403 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-10) Uncaught exception received by Vert.x: java.lang.OutOfMemoryError: Java heap space
2025-01-10 16:13:51,413 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-10) Uncaught exception received by Vert.x: java.lang.OutOfMemoryError: Java heap space
2025-01-10 16:13:51,423 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-10) Uncaught exception received by Vert.x: java.lang.OutOfMemoryError: Java heap space
2025-01-10 16:13:51,433 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-10) Uncaught exception received by Vert.x: java.lang.OutOfMemoryError: Java heap space
2025-01-10 16:13:51,442 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-10) Uncaught exception received by Vert.x: java.lang.OutOfMemoryError: Java heap space

I'll dig more and report back.

@gsmet
Copy link
Member

gsmet commented Jan 10, 2025

The initial stack trace was:

2025-01-10 16:12:24,304 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-10) Uncaught exception received by Vert.x: java.lang.OutOfMemoryError: Java heap space
	at io.netty.util.internal.PlatformDependent.allocateUninitializedArray(PlatformDependent.java:325)
	at io.netty.buffer.UnpooledUnsafeHeapByteBuf.allocateArray(UnpooledUnsafeHeapByteBuf.java:39)
	at io.netty.buffer.UnpooledHeapByteBuf.capacity(UnpooledHeapByteBuf.java:133)
	at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:305)
	at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:280)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103)
	at io.vertx.core.buffer.impl.BufferImpl.appendBuffer(BufferImpl.java:268)
	at io.vertx.core.http.impl.HttpEventHandler.handleChunk(HttpEventHandler.java:54)
	at io.vertx.core.http.impl.HttpClientResponseImpl.handleChunk(HttpClientResponseImpl.java:239)
	at io.vertx.core.http.impl.HttpClientRequestBase$$Lambda/0x00007fcc942b6e08.handle(Unknown Source)
	at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.lambda$new$0(Http1xClientConnection.java:429)
	at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl$$Lambda/0x00007fcc942acdd0.handle(Unknown Source)
	at io.vertx.core.streams.impl.InboundBuffer.handleEvent(InboundBuffer.java:279)
	at io.vertx.core.streams.impl.InboundBuffer.write(InboundBuffer.java:157)
	at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.handleChunk(Http1xClientConnection.java:705)
	at io.vertx.core.http.impl.Http1xClientConnection$$Lambda/0x00007fcc942b9da8.handle(Unknown Source)
	at io.vertx.core.impl.ContextImpl.execute(ContextImpl.java:313)
	at io.vertx.core.impl.DuplicatedContext.execute(DuplicatedContext.java:159)
	at io.vertx.core.http.impl.Http1xClientConnection.handleResponseChunk(Http1xClientConnection.java:888)
	at io.vertx.core.http.impl.Http1xClientConnection.handleHttpMessage(Http1xClientConnection.java:807)
	at io.vertx.core.http.impl.Http1xClientConnection.handleMessage(Http1xClientConnection.java:774)
	at io.vertx.core.net.impl.ConnectionBase.read(ConnectionBase.java:159)
	at io.vertx.core.net.impl.VertxHandler.channelRead(VertxHandler.java:153)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:333)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:455)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
	at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)

and then it gets crazy.

@gsmet
Copy link
Member

gsmet commented Jan 13, 2025

The first thing I noticed when I started working on this is that ClientJacksonMessageBodyReader is stateful and keeps a reference to the RestClientRequestContext which in turn keeps a reference to the Vert.x response (containing the whole file as a Buffer) and also contains another copy of the bytes in responseEntityStream. So you have two copies of the file bytes in there.

We keep the context (and the duplicate copy of the bytes) in memory until the next request, that might clear ClientJacksonMessageBodyReader context with something smaller.

Note that the fact that it's stateful is also problematic from a concurrency point of view for the resolution of the ObjectMapper so we have a bug here.

There's more to it though, here is a summary of my findings:

  • ClientJacksonMessageBodyReader shouldn't be stateful. Fixing this issue will fix the concurrency issue and make sure we free the memory without waiting for the next request that will clear the context.
  • I think we should see if we can improve how much information RestClientRequestContext. Keeping the original Vert.x response with its copy of the bytes + keeping another copy of the bytes looks problematic.
  • Could we avoid copying the whole byte[] from the Buffer to responseEntityStream (see ClientSendRequestHandler)? From what I can see, we have a ByteBufInputStream that could maybe be used but I'm not sure how it goes in the case of streaming? Also apparently, the Buffer#getByteBuf() method is going away in Vert.x 5. Not sure if it has been replaced... but we are already using it in other parts of the code so it will end up being an issue anyway.
  • And there's another issue that I think is related to how Netty handles its memory: even with the dirty patch below (I use this patch to show this problem as it's a lot more visible with the heap usage reduced but you can experience it without the patch too) that improves the situation by avoiding too many copies, with 600 MB of heap, I end up with an OOM with a bit less than 400 MB of heap consumed by the app. I suspect something in Netty is keeping some native memory around but I'm not sure how to check this theory. If so, we basically need something close to 3 times the size of the file as the heap limit, which seems dubious. What worries me a lot tbh is that it makes things close to impossible to monitor as you have no idea if you have some memory free or if Netty is using 200 MB of memory that is not accounted for (again if my theory proves to be correct).

Furthermore, the OOM is handled extremely badly:

  • First exception, 2nd exception
  • Then 8 minutes of a constant stream of 2025-01-10 18:55:20,573 ERROR [io.qua.ver.cor.run.VertxCoreRecorder] (vert.x-eventloop-thread-4) Uncaught exception received by Vert.x: java.lang.OutOfMemoryError: Java heap space
  • Finally actual client error

I'm not sure we can do something about it but it doesn't seem ideal.

To illustrate the problem, here are a few elements. What's in the dump when you get an OOM with the original code (see the two full copies of the bytes):

Image

The memory usage when I get an OOM (with the very dirty patch below) - you can see we are far from using all the heap with we get the OOM:

Image

A very dirty patch that improve the situation a bit (but is definitely not something we want to apply). I include it here so that you can quickly see the affected areas of code:

diff --git a/extensions/resteasy-reactive/rest-client-jackson/runtime/src/main/java/io/quarkus/rest/client/reactive/jackson/runtime/serialisers/ClientJacksonMessageBodyReader.java b/extensions/resteasy-reactive/rest-client-jackson/runtime/src/main/java/io/quarkus/rest/client/reactive/jackson/runtime/serialisers/ClientJacksonMessageBodyReader.java
index fd4169f64e0..4f1b7b07635 100644
--- a/extensions/resteasy-reactive/rest-client-jackson/runtime/src/main/java/io/quarkus/rest/client/reactive/jackson/runtime/serialisers/ClientJacksonMessageBodyReader.java
+++ b/extensions/resteasy-reactive/rest-client-jackson/runtime/src/main/java/io/quarkus/rest/client/reactive/jackson/runtime/serialisers/ClientJacksonMessageBodyReader.java
@@ -1,14 +1,11 @@
 package io.quarkus.rest.client.reactive.jackson.runtime.serialisers;
 
-import static io.quarkus.rest.client.reactive.jackson.runtime.serialisers.JacksonUtil.getObjectMapperFromContext;
-
 import java.io.IOException;
 import java.io.InputStream;
 import java.lang.annotation.Annotation;
 import java.lang.reflect.Type;
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.ConcurrentMap;
-import java.util.function.Function;
 
 import jakarta.inject.Inject;
 import jakarta.ws.rs.WebApplicationException;
@@ -32,7 +29,6 @@ public class ClientJacksonMessageBodyReader extends JacksonBasicMessageBodyReade
     private static final Logger log = Logger.getLogger(ClientJacksonMessageBodyReader.class);
 
     private final ConcurrentMap<ObjectMapper, ObjectReader> objectReaderMap = new ConcurrentHashMap<>();
-    private RestClientRequestContext context;
 
     @Inject
     public ClientJacksonMessageBodyReader(ObjectMapper mapper) {
@@ -58,20 +54,10 @@ public Object readFrom(Class<Object> type, Type genericType, Annotation[] annota
 
     @Override
     public void handle(RestClientRequestContext requestContext) {
-        this.context = requestContext;
+        //this.context = requestContext;
     }
 
     private ObjectReader getEffectiveReader(MediaType responseMediaType) {
-        ObjectMapper effectiveMapper = getObjectMapperFromContext(responseMediaType, context);
-        if (effectiveMapper == null) {
-            return getEffectiveReader();
-        }
-
-        return objectReaderMap.computeIfAbsent(effectiveMapper, new Function<>() {
-            @Override
-            public ObjectReader apply(ObjectMapper objectMapper) {
-                return objectMapper.reader();
-            }
-        });
+        return getEffectiveReader();
     }
 }
diff --git a/independent-projects/resteasy-reactive/client/runtime/src/main/java/org/jboss/resteasy/reactive/client/handlers/ClientSendRequestHandler.java b/independent-projects/resteasy-reactive/client/runtime/src/main/java/org/jboss/resteasy/reactive/client/handlers/ClientSendRequestHandler.java
index ea066ae352e..5ba6fd50d5d 100644
--- a/independent-projects/resteasy-reactive/client/runtime/src/main/java/org/jboss/resteasy/reactive/client/handlers/ClientSendRequestHandler.java
+++ b/independent-projects/resteasy-reactive/client/runtime/src/main/java/org/jboss/resteasy/reactive/client/handlers/ClientSendRequestHandler.java
@@ -1,6 +1,5 @@
 package org.jboss.resteasy.reactive.client.handlers;
 
-import java.io.ByteArrayInputStream;
 import java.io.File;
 import java.io.IOException;
 import java.io.InputStream;
@@ -41,6 +40,7 @@
 import org.jboss.resteasy.reactive.common.core.Serialisers;
 import org.jboss.resteasy.reactive.common.util.MultivaluedTreeMap;
 
+import io.netty.buffer.ByteBufInputStream;
 import io.netty.handler.codec.DecoderException;
 import io.netty.handler.codec.http.HttpHeaderValues;
 import io.netty.handler.codec.http.LastHttpContent;
@@ -361,7 +361,7 @@ public void handle(AsyncResult<Buffer> ar) {
                                         try {
                                             if (buffer.length() > 0) {
                                                 requestContext.setResponseEntityStream(
-                                                        new ByteArrayInputStream(buffer.getBytes()));
+                                                        new ByteBufInputStream(buffer.getByteBuf()));
                                             } else {
                                                 requestContext.setResponseEntityStream(null);
                                             }
diff --git a/independent-projects/resteasy-reactive/client/runtime/src/main/java/org/jboss/resteasy/reactive/client/handlers/VertxClientInputStream.java b/independent-projects/resteasy-reactive/client/runtime/src/main/java/org/jboss/resteasy/reactive/client/handlers/VertxClientInputStream.java
index 69fa50754a8..b288778becf 100644
--- a/independent-projects/resteasy-reactive/client/runtime/src/main/java/org/jboss/resteasy/reactive/client/handlers/VertxClientInputStream.java
+++ b/independent-projects/resteasy-reactive/client/runtime/src/main/java/org/jboss/resteasy/reactive/client/handlers/VertxClientInputStream.java
@@ -24,17 +24,14 @@ class VertxClientInputStream extends InputStream {
     private boolean closed;
     private boolean finished;
     private ByteBuf pooled;
-    private final RestClientRequestContext vertxResteasyReactiveRequestContext;
 
     public VertxClientInputStream(HttpClientResponse response, long timeout,
             RestClientRequestContext vertxResteasyReactiveRequestContext) {
-        this.vertxResteasyReactiveRequestContext = vertxResteasyReactiveRequestContext;
         this.exchange = new VertxBlockingInput(response, timeout);
     }
 
     public VertxClientInputStream(HttpClientResponse request, long timeout, ByteBuf existing,
             RestClientRequestContext vertxResteasyReactiveRequestContext) {
-        this.vertxResteasyReactiveRequestContext = vertxResteasyReactiveRequestContext;
         this.exchange = new VertxBlockingInput(request, timeout);
         this.pooled = existing;
     }
diff --git a/independent-projects/resteasy-reactive/client/runtime/src/main/java/org/jboss/resteasy/reactive/client/impl/RestClientRequestContext.java b/independent-projects/resteasy-reactive/client/runtime/src/main/java/org/jboss/resteasy/reactive/client/impl/RestClientRequestContext.java
index bdd37464772..480f3b89086 100644
--- a/independent-projects/resteasy-reactive/client/runtime/src/main/java/org/jboss/resteasy/reactive/client/impl/RestClientRequestContext.java
+++ b/independent-projects/resteasy-reactive/client/runtime/src/main/java/org/jboss/resteasy/reactive/client/impl/RestClientRequestContext.java
@@ -87,10 +87,6 @@ public class RestClientRequestContext extends AbstractResteasyReactiveContext<Re
     private final ClientRestHandler[] abortHandlerChainWithoutResponseFilters;
 
     private final boolean disableContextualErrorMessages;
-    /**
-     * Only initialised once we get the response
-     */
-    private HttpClientResponse vertxClientResponse;
     // Changed by the request filter
     Map<String, Object> properties;
     private HttpClientRequest httpClientRequest;
@@ -235,7 +231,6 @@ public void initialiseResponse(HttpClientResponse vertxResponse) {
         for (String i : vertxHeaders.names()) {
             headers.addAll(i, vertxHeaders.getAll(i));
         }
-        this.vertxClientResponse = vertxResponse;
         responseStatus = vertxResponse.statusCode();
         responseReasonPhrase = vertxResponse.statusMessage();
         responseHeaders = headers;
@@ -310,7 +305,7 @@ public CompletableFuture<ResponseImpl> getResult() {
     }
 
     public HttpClientResponse getVertxClientResponse() {
-        return vertxClientResponse;
+        return null;
     }
 
     @Override

@geoand
Copy link
Contributor

geoand commented Jan 13, 2025

Thanks for the thorough analysis!

ClientJacksonMessageBodyReader shouldn't be stateful. Fixing this issue will fix the concurrency issue and make sure we free the memory without waiting for the next request that will clear the context.

This would be addressed by #45518

As for rest of the points, I'll have to take a closer look in order to comment

gsmet added a commit to gsmet/quarkus that referenced this issue Jan 13, 2025
This is related to quarkusio#45122 and reduce the amount of copy of the content
we keep in memory.
gsmet added a commit to gsmet/quarkus that referenced this issue Jan 14, 2025
This is related to quarkusio#45122 and reduce the amount of copy of the content
we keep in memory.

(cherry picked from commit 09edb56)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/rest kind/question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants