Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RunnerClassLoader - Some classes are loaded multiple times #42874

Open
gsmet opened this issue Aug 29, 2024 · 25 comments
Open

RunnerClassLoader - Some classes are loaded multiple times #42874

gsmet opened this issue Aug 29, 2024 · 25 comments
Labels
area/classloader kind/bug Something isn't working

Comments

@gsmet
Copy link
Member

gsmet commented Aug 29, 2024

I think there is still some issue with the new RunnerClassLoader implementation from #42139.

I have already mentioned in some other work that I was under the impression something was odd with class loading stuff being a bit more present than in 3.13.
I was experimenting with another patch today and just trying to start the very simple Quarkus GitHub Bot app in prod mode.

Again I saw the ClassLoader being a bit more present than in 3.13.

I did something quite simple, I applied this patch:

diff --git a/independent-projects/bootstrap/runner/src/main/java/io/quarkus/bootstrap/runner/RunnerClassLoader.java b/independent-projects/bootstrap/runner/src/main/java/io/quarkus/bootstrap/runner/RunnerClassLoader.java
index 071d77ffef1..4f6fdb1895f 100644
--- a/independent-projects/bootstrap/runner/src/main/java/io/quarkus/bootstrap/runner/RunnerClassLoader.java
+++ b/independent-projects/bootstrap/runner/src/main/java/io/quarkus/bootstrap/runner/RunnerClassLoader.java
@@ -88,6 +88,9 @@ public Class<?> loadClass(String name, boolean resolve) throws ClassNotFoundExce
         if (loaded != null) {
             return loaded;
         }
+
+        System.out.println("Loading class: " + name);
+
         final ClassLoadingResource[] resources;
         if (packageName == null) {
             resources = resourceDirectoryMap.get("");

While most classes are loaded once, I have several classes loaded multiple times, up to 16 times for some of them:

     16 Loading class: io.vertx.core.http.impl.HttpUtils
     16 Loading class: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
     16 Loading class: io.quarkus.vertx.http.runtime.VertxHttpRecorder$WebDeploymentVerticle$3
     16 Loading class: io.netty.util.NetUtil
      8 Loading class: io.vertx.core.net.impl.SocketAddressImpl
      6 Loading class: io.vertx.core.http.impl.HttpServerImpl
      5 Loading class: io.vertx.core.net.impl.TCPServerBase

(the rest of the classes are loaded only once)

Reverting #42139 brings us back to normal AFAICS.

It doesn't sound like a desired behavior to me?

@gsmet gsmet added the kind/bug Something isn't working label Aug 29, 2024
@gsmet
Copy link
Member Author

gsmet commented Aug 29, 2024

/cc @geoand @mariofusco @franz1981 @Sanne for your awareness

@geoand
Copy link
Contributor

geoand commented Aug 29, 2024

It definitely warrants a thorough investigation

@gsmet
Copy link
Member Author

gsmet commented Aug 29, 2024

Note that it's not stable, another run just now gave me:

     16 Loading class: io.vertx.core.http.impl.HttpUtils
     16 Loading class: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
     16 Loading class: io.quarkus.vertx.http.runtime.VertxHttpRecorder$WebDeploymentVerticle$3
     16 Loading class: io.netty.util.NetUtil
     12 Loading class: io.vertx.core.net.impl.SocketAddressImpl
      8 Loading class: io.vertx.core.net.impl.TCPServerBase
      6 Loading class: io.vertx.core.http.impl.HttpServerImpl

No idea why they are all related to Vert.x/Netty.

@gsmet
Copy link
Member Author

gsmet commented Aug 29, 2024

/cc @cescoffier @maxandersen too

@franz1981
Copy link
Contributor

https://github.com/quarkusio/quarkus/pull/42139/files#diff-a8ea1b8267b00e0c29800f9e3d324797e9b62667b590fe3c3c8ad81f708f44f4R196

I remember that apart from the concurrency of the algorithm, this part has changed as well

@franz1981
Copy link
Contributor

@gsmet the events of class loading happen in burst or are separated from each others?
Because being now concurrent (truly concurrent) maybe N loading are initiated but only one really succeeded...

@gsmet
Copy link
Member Author

gsmet commented Aug 29, 2024

They happen all at once for each class and very close to each other.

See this attached file:
class-loading-events-main.txt

@franz1981
Copy link
Contributor

@gsmet I believe it depends by the number of event loops which initiate the loading by accessing the classes, try printing the current thread too...

@gsmet
Copy link
Member Author

gsmet commented Aug 29, 2024

Here is an extract with the thread name (not all of them but you get the idea):

Loading class: io.vertx.core.http.impl.HttpServerImpl from thread: vert.x-eventloop-thread-0
Loading class: io.vertx.core.http.impl.HttpServerImpl from thread: vert.x-eventloop-thread-1
Loading class: io.vertx.core.http.impl.HttpServerImpl from thread: vert.x-eventloop-thread-2
Loading class: io.vertx.core.net.impl.TCPServerBase from thread: vert.x-eventloop-thread-0
Loading class: io.vertx.core.net.impl.TCPServerBase from thread: vert.x-eventloop-thread-1
Loading class: io.vertx.core.net.impl.TCPServerBase from thread: vert.x-eventloop-thread-2
Loading class: io.vertx.core.net.impl.TCPServerBase from thread: vert.x-eventloop-thread-4
Loading class: io.vertx.core.net.impl.TCPServerBase from thread: vert.x-eventloop-thread-3
Loading class: io.vertx.core.net.impl.TCPServerBase from thread: vert.x-eventloop-thread-5
Loading class: io.vertx.core.net.impl.TCPServerBase from thread: vert.x-eventloop-thread-6
Loading class: io.vertx.core.net.SocketAddress from thread: vert.x-eventloop-thread-0
Loading class: io.vertx.core.spi.metrics.TCPMetrics from thread: vert.x-eventloop-thread-0
Loading class: io.vertx.core.spi.metrics.NetworkMetrics from thread: vert.x-eventloop-thread-0
Loading class: io.vertx.core.net.impl.NetServerImpl from thread: vert.x-eventloop-thread-0
Loading class: io.vertx.core.impl.future.Mapping from thread: main
Loading class: io.vertx.core.impl.future.Operation from thread: main
Loading class: io.vertx.core.impl.future.FutureImpl$4 from thread: main
Loading class: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler from thread: vert.x-eventloop-thread-13
Loading class: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler from thread: vert.x-eventloop-thread-15
Loading class: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler from thread: vert.x-eventloop-thread-0
Loading class: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler from thread: vert.x-eventloop-thread-14
Loading class: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler from thread: vert.x-eventloop-thread-6
Loading class: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler from thread: vert.x-eventloop-thread-5
Loading class: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler from thread: vert.x-eventloop-thread-7
Loading class: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler from thread: vert.x-eventloop-thread-1
Loading class: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler from thread: vert.x-eventloop-thread-8
Loading class: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler from thread: vert.x-eventloop-thread-3
Loading class: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler from thread: vert.x-eventloop-thread-4
Loading class: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler from thread: vert.x-eventloop-thread-2

@gsmet
Copy link
Member Author

gsmet commented Aug 29, 2024

BTW, what concerns me a bit is that I have seen the following pattern several times in various flame graphs of various applications and always with main in this position so while it could be an artifact of the method I used to get the flamegraph, it still looks a bit odd to me:

Loading a given class in 3.13

Screenshot from 2024-08-29 19-04-52

Loading the same class in main

Screenshot from 2024-08-29 19-04-45

The class loader stuff looks a lot more present in main. And it's not the first time I observe that, I have seen it in different flamegraphs of different applications lately.

@franz1981
Copy link
Contributor

I suggest to capture traces via event=Java_java_lang_ClassLoader_defineClass1

Adding the threads option, so you will see the threads which performed the class loading and count the number of samples.

If they match, the actual number of class loading event is the same.

I have the gut feeling (to verify) that we are not observing multiple class loading but just a racing between threads while printing that we initiated a class load event, with a single winner performing it, but I need to look at the code to make sure of it.

@Sanne
Copy link
Member

Sanne commented Aug 29, 2024

I believe it's a consequence of the new design: it's not using locks, rather letting it fail (and ignore) on duplicate class definition.

Not ideal, but a tradeoff which was chosen intentionally.

@gsmet
Copy link
Member Author

gsmet commented Aug 29, 2024

But why are we paying this tradeoff when not using virtual threads?

I can understand we want a new design for this particular use case but why make the standard case worse?

@gsmet
Copy link
Member Author

gsmet commented Aug 29, 2024

I have the gut feeling (to verify) that we are not observing multiple class loading but just a racing between threads while printing that we initiated a class load event, with a single winner performing it, but I need to look at the code to make sure of it.

Sorry, when I said the classes are loaded several times, I don't mean that we define them several times. What I'm saying is that we will load the content of the class several times, and if it has been defined already it will fail in defineClass and we will load the class again with findLoadedClass.

Basically what we are doing 16 times is to read the class bytes from the ClassLoadingResource.

That being said, I'm not sure it explains what I have seen in my flame graphs over and over these past few weeks.
I have other things to do atm and if you have a strong opinion of how this particular additional pressure (if it's actually real) should be chased (the ones from the flamegraphs), please do.

Maybe it's a false alarm but again, I have seen it consistently for the past weeks and always in main compared to previous versions.

@mariofusco
Copy link
Contributor

I agree with @Sanne's and @franz1981's analysis: the behaviour found by @gsmet is admittedly not ideal but it is a trade-off that is the direct consequence of having flagged this class loader as parallel capable plus the fully non-blocking design required to support virtual threads. This is also empirically demonstrated by the fact that the maximum number of parallel class loading in Guillaume experiments is 16 which is probably also the number of cores of his machine.

But why are we paying this tradeoff when not using virtual threads?

This is a fair observation and indeed one of the first implementation that I attempted had 2 different execution paths depending on the nature of the thread accessing the class loader. I abandoned that idea for a few reasons:

  1. it made the algorithm even more complex and hard to debug than how it already is
  2. it required to keep in the class loader all the locks and data structures necessary for both cases thus making the class loader itself heavier and more memory demanding
  3. (more important) it cannot work if both native and virtual threads try to access concurrently to this class loader

The only other fix (or hack) that could allow us to mitigate this problem that comes me to mind is putting together a short list of the classes having the highest probability of triggering this problem and preload them when the class loader is created. I have the feeling (but correct me if I'm wrong) that they are always more or less the same, depending on the quarkus/vertx/netty stack and not by the specific test case used by @gsmet to investigate this issue. I admit that this is an ugly workaround (and feels a bit like cheating) but this class loader is already specifically designed for quarkus and far from being a general purpose one, so if we could leverage our knowledge of our architecture why shouldn't we?

@franz1981
Copy link
Contributor

@franz1981
Copy link
Contributor

franz1981 commented Aug 30, 2024

I think instead this "problem" could be solved actually: https://github.com/mariofusco/quarkus/blob/0c479cecdbd6625edc9146bc958b483886f378d4/independent-projects/bootstrap/runner/src/main/java/io/quarkus/bootstrap/runner/RunnerClassLoader.java#L111C24-L111C35

here the defineClass can fail with linkage error due to concurrently defining a classs; this is where we can decide to have virtual threads to fail and normal threads to synchronize vs the resource data (byte[]? maybe) before making definePackage and defineClass to happen. This will save the exception to occur assuming the 2 threads are using the same byte[] (which is likely, given that we use reference counting for this reason!).
And perf-wise should still be "decent" because the concurrency level and granularity will be enormously better than what we had with a non-parallel class laoder.

wdyt @gsmet @Sanne @mariofusco ?

@mariofusco
Copy link
Contributor

@mariofusco https://github.com/quarkusio/quarkus/pull/42139/files#diff-f20997e90aae1d2fe046b512defb1a3e14215b7a834177d43369542bae99a7f6R104 it shows that we still have separate paths for these....

True, that's on a lower level than the point where @gsmet was counting the number of class loading requests and more internal though. Moreover it doesn't require the use of any read/write lock as the original implementation did.

@gsmet
Copy link
Member Author

gsmet commented Aug 30, 2024

I think we need to be careful not making things slower by trying to detect a narrow case. Maybe we can live with this additional loading of resources, I don't know.

@franz1981 were you able to reproduce the CL activity that looks more intense in main compared to 3.13 that I was showing in the flame graph?

I was observing that with https://github.com/quarkusio/quarkus-github-bot but I have seen the same with other apps too. It might just be a matter of sampling but I have seen it three times now always with main vs 3.13.

@franz1981
Copy link
Contributor

I think we need to be careful not making things slower by trying to detect a narrow case. Maybe we can live with this additional loading of resources, I don't know.

we don't load but define class more times, failing due to linking error

@franz1981 were you able to reproduce the CL activity that looks more intense in main compared to 3.13 that I was showing in the flame graph?

If you detected it because of the number of cores of your machine, is not needed - it is by design that would happen - including throwing exception to fallback on the existing defined class

The thing is that the solution I've proposed at #42874 (comment) can be prone to deadlock(s) - @dmlloyd can help me here, actually.

Another solution instead is to leverage the existing algorithm which distinguish the thread which has loaded the resource data the first time from who is concurrently doing it.

@gsmet
Copy link
Member Author

gsmet commented Aug 30, 2024

we don't load but define class more times, failing due to linking error

I said loading resources, we are reading the byte[] multiple times.

If you detected it because of the number of cores of your machine, is not needed - it is by design that would happen - including throwing exception to fallback on the existing defined class

No, the flamegraphs that I posted above are about loading some Jackson classes, they are not affected by this very issue which only happens when starting the Vert.x HTTP server on the multiple event loop threads.

@dmlloyd
Copy link
Member

dmlloyd commented Aug 30, 2024

I think instead this "problem" could be solved actually: https://github.com/mariofusco/quarkus/blob/0c479cecdbd6625edc9146bc958b483886f378d4/independent-projects/bootstrap/runner/src/main/java/io/quarkus/bootstrap/runner/RunnerClassLoader.java#L111C24-L111C35

here the defineClass can fail with linkage error due to concurrently defining a classs; this is where we can decide to have virtual threads to fail and normal threads to synchronize vs the resource data (byte[]? maybe) before making definePackage and defineClass to happen. This will save the exception to occur assuming the 2 threads are using the same byte[] (which is likely, given that we use reference counting for this reason!). And perf-wise should still be "decent" because the concurrency level and granularity will be enormously better than what we had with a non-parallel class laoder.

wdyt @gsmet @Sanne @mariofusco ?

If you're talking about rejecting class loading from virtual threads to avoid pinning, I think this is not necessarily a bad idea, but it will require us to preload some classes, won't it?

@franz1981
Copy link
Contributor

I said loading resources, we are reading the byte[] multiple times.

@gsmet sorry I've misread it - this shouldn't really happen because ref counting should handle, if the resources are not yet closed - to share the existing byte[] (eventually waiting till it is made available) , see https://github.com/quarkusio/quarkus/blob/main/independent-projects/bootstrap/runner/src/main/java/io/quarkus/bootstrap/runner/JarFileReference.java#L91

If this is not happening, something is wrong with the algorithm (@mariofusco ) or the cache where we store it was evicted.
But given that you said it was happening in batch I suppose is the former.

@franz1981
Copy link
Contributor

franz1981 commented Aug 30, 2024

@gsmet Please try add the System.out at:

System.out.println(Thread.currentThread() + " is defining " + name + " using the data: " + System.identityHashCode(data))

and at

as

System.out.println(Thread.currentThread() + " has failed to define " + name)

If, as I would expect, the data's identity hashcode is the same, I would expect 15/16 concurrent define to fail with linkage error.
If the data identity hashcode would be different, instead, I need to think more what could have happened.

Regardless this issue, as agreed with @Sanne some time ago, I think we need some custom JFR events here to emit what the class loader is doing (can I delegate this to you @mariofusco ?).
These events, together with the existing JFR class loading/defining events, should be enough to both write regression tests and eventually collect them for perf purposes.

@franz1981
Copy link
Contributor

franz1981 commented Sep 4, 2024

My proposal here @mariofusco :

  • the thread which has to join by waiting the jar resource has a better chance to recheck by find class if the class it wants is already there, or
  • we can use a single value cache for the last defined class in a jar resource (with success, without linkage errors!) - and use the same strategy of the previous point, but checking first if the cached last defined class name matches the one to define; if is a match just prepend the define attempt with a find class

This should avoid the storm of linkage errors/repeated byte[] reads on the same jar res.
Clearly this is a problem with laptop machines and big servers, where the concurrent behaviour is more evident and possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/classloader kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants