-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RunnerClassLoader - Some classes are loaded multiple times #42874
Comments
/cc @geoand @mariofusco @franz1981 @Sanne for your awareness |
It definitely warrants a thorough investigation |
Note that it's not stable, another run just now gave me:
No idea why they are all related to Vert.x/Netty. |
/cc @cescoffier @maxandersen too |
I remember that apart from the concurrency of the algorithm, this part has changed as well |
@gsmet the events of class loading happen in burst or are separated from each others? |
They happen all at once for each class and very close to each other. See this attached file: |
@gsmet I believe it depends by the number of event loops which initiate the loading by accessing the classes, try printing the current thread too... |
Here is an extract with the thread name (not all of them but you get the idea):
|
BTW, what concerns me a bit is that I have seen the following pattern several times in various flame graphs of various applications and always with Loading a given class in 3.13Loading the same class in mainThe class loader stuff looks a lot more present in |
I suggest to capture traces via event=Java_java_lang_ClassLoader_defineClass1 Adding the threads option, so you will see the threads which performed the class loading and count the number of samples. If they match, the actual number of class loading event is the same. I have the gut feeling (to verify) that we are not observing multiple class loading but just a racing between threads while printing that we initiated a class load event, with a single winner performing it, but I need to look at the code to make sure of it. |
I believe it's a consequence of the new design: it's not using locks, rather letting it fail (and ignore) on duplicate class definition. Not ideal, but a tradeoff which was chosen intentionally. |
But why are we paying this tradeoff when not using virtual threads? I can understand we want a new design for this particular use case but why make the standard case worse? |
Sorry, when I said the classes are loaded several times, I don't mean that we define them several times. What I'm saying is that we will load the content of the class several times, and if it has been defined already it will fail in Basically what we are doing 16 times is to read the class bytes from the That being said, I'm not sure it explains what I have seen in my flame graphs over and over these past few weeks. Maybe it's a false alarm but again, I have seen it consistently for the past weeks and always in |
I agree with @Sanne's and @franz1981's analysis: the behaviour found by @gsmet is admittedly not ideal but it is a trade-off that is the direct consequence of having flagged this class loader as parallel capable plus the fully non-blocking design required to support virtual threads. This is also empirically demonstrated by the fact that the maximum number of parallel class loading in Guillaume experiments is 16 which is probably also the number of cores of his machine.
This is a fair observation and indeed one of the first implementation that I attempted had 2 different execution paths depending on the nature of the thread accessing the class loader. I abandoned that idea for a few reasons:
The only other fix (or hack) that could allow us to mitigate this problem that comes me to mind is putting together a short list of the classes having the highest probability of triggering this problem and preload them when the class loader is created. I have the feeling (but correct me if I'm wrong) that they are always more or less the same, depending on the quarkus/vertx/netty stack and not by the specific test case used by @gsmet to investigate this issue. I admit that this is an ugly workaround (and feels a bit like cheating) but this class loader is already specifically designed for quarkus and far from being a general purpose one, so if we could leverage our knowledge of our architecture why shouldn't we? |
@mariofusco https://github.com/quarkusio/quarkus/pull/42139/files#diff-f20997e90aae1d2fe046b512defb1a3e14215b7a834177d43369542bae99a7f6R104 it shows that we still have separate paths for these.... |
I think instead this "problem" could be solved actually: https://github.com/mariofusco/quarkus/blob/0c479cecdbd6625edc9146bc958b483886f378d4/independent-projects/bootstrap/runner/src/main/java/io/quarkus/bootstrap/runner/RunnerClassLoader.java#L111C24-L111C35 here the wdyt @gsmet @Sanne @mariofusco ? |
True, that's on a lower level than the point where @gsmet was counting the number of class loading requests and more internal though. Moreover it doesn't require the use of any read/write lock as the original implementation did. |
I think we need to be careful not making things slower by trying to detect a narrow case. Maybe we can live with this additional loading of resources, I don't know. @franz1981 were you able to reproduce the CL activity that looks more intense in I was observing that with https://github.com/quarkusio/quarkus-github-bot but I have seen the same with other apps too. It might just be a matter of sampling but I have seen it three times now always with |
we don't load but define class more times, failing due to linking error
If you detected it because of the number of cores of your machine, is not needed - it is by design that would happen - including throwing exception to fallback on the existing defined class The thing is that the solution I've proposed at #42874 (comment) can be prone to deadlock(s) - @dmlloyd can help me here, actually. Another solution instead is to leverage the existing algorithm which distinguish the thread which has loaded the resource data the first time from who is concurrently doing it. |
I said
No, the flamegraphs that I posted above are about loading some Jackson classes, they are not affected by this very issue which only happens when starting the Vert.x HTTP server on the multiple event loop threads. |
If you're talking about rejecting class loading from virtual threads to avoid pinning, I think this is not necessarily a bad idea, but it will require us to preload some classes, won't it? |
@gsmet sorry I've misread it - this shouldn't really happen because ref counting should handle, if the resources are not yet closed - to share the existing byte[] (eventually waiting till it is made available) , see https://github.com/quarkusio/quarkus/blob/main/independent-projects/bootstrap/runner/src/main/java/io/quarkus/bootstrap/runner/JarFileReference.java#L91 If this is not happening, something is wrong with the algorithm (@mariofusco ) or the cache where we store it was evicted. |
@gsmet Please try add the System.out at: Line 110 in 708a762
and at Line 154 in 708a762
as
If, as I would expect, the data's identity hashcode is the same, I would expect 15/16 concurrent define to fail with linkage error. Regardless this issue, as agreed with @Sanne some time ago, I think we need some custom JFR events here to emit what the class loader is doing (can I delegate this to you @mariofusco ?). |
My proposal here @mariofusco :
This should avoid the storm of linkage errors/repeated byte[] reads on the same jar res. |
I think there is still some issue with the new
RunnerClassLoader
implementation from #42139.I have already mentioned in some other work that I was under the impression something was odd with class loading stuff being a bit more present than in 3.13.
I was experimenting with another patch today and just trying to start the very simple Quarkus GitHub Bot app in prod mode.
Again I saw the ClassLoader being a bit more present than in 3.13.
I did something quite simple, I applied this patch:
While most classes are loaded once, I have several classes loaded multiple times, up to 16 times for some of them:
(the rest of the classes are loaded only once)
Reverting #42139 brings us back to normal AFAICS.
It doesn't sound like a desired behavior to me?
The text was updated successfully, but these errors were encountered: