-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArC: use a separate lock for each bean in generated ContextInstances #37982
Conversation
...projects/arc/processor/src/main/java/io/quarkus/arc/processor/ContextInstancesGenerator.java
Outdated
Show resolved
Hide resolved
LGTM, just wondering how [un]likely is it that we reach the method bytecode size limit. Have you checked? |
Nope. I only used your https://github.com/Ladicek/arc-crazybeans with 500 |
I would expect 500 beans to be fine, but wouldn't be sure about 5000 beans :-) |
I'm still in vacation, but...do we really need any lock? |
Hm, so 1500 beans is OK but somewhere above this number I get an I can try to reduce the size of the relevant methods but honestly, I'd rather document to use |
IIUC, we need to guarantee (at least for normal scopes) that if a contextual instance is needed, exactly one is created, even if multiple threads ask for it at the "same" time. |
It's not only about visibility. We need to make sure that exactly one bean instance is created. |
@mkouba which translate in N getAndSet/compareAndSet/getVolatile based on the case, saving a MonitorExit-volatile store op, per instance (which I am checking now if is a volatile store as I fear). Additionally it means saving memory too, because you can have N static final atomic ref field updaters saving any additional lock instance and a generic util method which just need to bass the atomic reference as param and perform the expected pattern. Additionally, by doing it right, we would preserve the same behaviour of CHM, while now we perform a lock/unlock while reading vs being lock free. |
The allocation happen in the critical protected region, or we assume we can produce more then one but let just one to win and be set? eg
In this way you still have a single volatile op |
I cannot say I completely understand that "perf gibberish" above :D but |
On @#37982 (comment) I have written a potential protocol in 2 phases (claim + commit value) which allow lazy allocations of beans |
If I understand you correctly, the |
@Ladicek if the guarantee is to return what ever is set or setting ourself, yep, the loop is the appropriate behaviour. Consider that in case the caller thread is virtual we can still accept using Thread.yield while waiting, which would work fine. Given that the allocation is not supposed to perform I/O (I hope!! Correct me please if I am wrong) it shouldn't be a big deal. In case of I/O we can still wait 50us each attempt (via LockSupport::parkNanos which doesn't care about interruption) that's the default Linux time slack ns (min granularity of sleep) |
So if I understand you correctly, Object value = this.1; // volatile read
if (value != null && value != SENTINEL) {
return value;
}
if (CAS(this.1, null, SENTINEL) {
value = instanceSupplier.get();
this.1 = value; // volatile write
return value;
} else {
while (value == null || value == SENTINEL) {
Thread.onSpinWait();
value = this.1; // volatile read
}
} This could even be written as a I guess that's doable, but we'd have to make sure that I'm not sure how the bulk-processing methods ( Which leaves me wondering about spinning. I didn't look at what the suppliers do. I'll check. |
Yep, the only suggestion is to replace the volatile store with a lazySet/setRelease, because it doesn't need any Store load barrier and still work fine, because it should be translated as X = supplier.get(); The 2 barriers are a nop in x86 and are mostly compiler barriers which doesn't cost. If we don't do it, we risk to have the same exact performance hit of lock/unlock. Related OOM or thrown exceptions: what we need to do while spin waiting is up to us, but it's a good practice to assume it could fail, unless we can guarantee the supplier get to never fail, which would make the provided code to be ok (apart from the volatile store point explained above). |
OK so the supplier does 2 things:
The second step is very cheap, and the first one is usually also fairly cheap, but in case of synthetic beans, |
@Ladicek given that the provided util method is written just once, I think adding a local counter for the number of spin and then force a LockSupport::parkNanos(50000L) while exhausted, to be the safer choice. It would save burning too much CPU while waiting the value to be available. Let's say that we are optimizing 2 possible but still degenerative use cases:
They are kind of related, cause being slower increase the chance of concurrent attempts, but I would be surprised to see this as the common scenario |
Also in case of #37958 it calls an external service... |
So I'm not against experimenting with the lock-less approach described above. But:
|
@mkouba
Do you already have a micro benchmark with a non-trivial but still realistic number of beans? Just an hint: to make such microbenchmark more realistic, add a configurable parameter |
That's in |
I don't think so. A |
Ouch, I didn't realize that. In that case, I'm against spinning, as we basically allow running arbitrary user code there. |
@mkouba we just got a second report of this causing issues, should we make the default |
I haven't had the time to fully delve into the details here but it seems it might be better to apply the fix suggested here and then continue iterating on it based on what Francesco and Ladislav wrote. My understading was that the current fix is not inherently wrong, just perhaps not optimal. Or did I miss something? |
@manovotn That is my understanding as well ;-). At least we know that it fixes #37958 and #38040. My simple benchmarks do not show a significant regression either.
@gsmet I'd rather get this fix in and backport to 3.6 if needed. Of course, I'd like to continue with investigation in this area. First, we need some more benchmarks... |
@mkouba oh I think I wasn't very clear: I completely agree we should pursue with the fix. My only worry is that it might be too complex to backport thus why I proposed to switch the default in 3.6. But maybe it's better to push the fix to 3.6 to have some more bake time before the LTS. Let me know what you prefer. |
- fixes quarkusio#37958 and quarkusio#38040 - use a separate lock for each bean in the generated ContextInstances - replace ContextInstances#forEach() and ContextInstances#clear() with ContextInstances#removeEach() - optimize the generated ContextInstances to significantly reduce the size of the generated bytecode
@gsmet Would it help if we prepare a separate branch for 3.6? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just wondering if this thing fell through the cracks:
I can try to reduce the size of the relevant methods but honestly, I'd rather document to use quarkus.arc.optimize-contexts=false for this kind of "microservice" ;-).
It didn't but I cannot find a good place in the docs. I mean, currently we don't generate the docs for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to go through this mainly by comparing the generated ContextInstances
class before and after this code change and I think it looks good.
Given that it fixes the two reported scenarios, I'd get this merged and discuss further optimizations in separate PRs/issues.
@mkouba I will try to backport it once it's merged and will let you know if we need a specific branch. No need to do the work if it's not required. |
There is a |
Ah, yes so we will need a separate branch. FWIW, I would be a bit cautious about using Java 17 features in fixes for now. |
@gsmet FYI https://github.com/mkouba/quarkus/tree/issue-37982-36. It contains 2 commits (from this PR and #37529) |
✔️ The latest workflow run for the pull request has completed successfully. It should be safe to merge provided you have a look at the other checks in the summary. |
@mkouba can you create a PR so that I get it on my radar and we can get CI to run? Thanks! |
This has been backported as part of #38069 . |
Just got this silly idea this morning: how about we disable this optimization automatically in case there's more than, say, 1000 beans in the application? It could be per context, but I think globally would be fine too. |
That is actually a good idea. And 1000 is a nice number. Hm, maybe we could change the type of the property to string and do something similar to |
That makes sense to me. |
That's great idea! +1 |
For the record: #38121 |
I'd like to stress that I would still recommend users to avoid things like calling external services in the
@PostConstruct
/@PreDestroy
callbacks. Very often it's not clear when exactly a bean will be instantiated and since the API is blocking an undesired behavior may occur.CC @franz1981 - I've replaced the
forEach()
method withremoveEach()
which is more useful for the request context destruction. It would be great if you could run some perf benchmarks so that we can be sure that this PR does not cause any significant perf regression. Although I expect a perf drop because we must use a separate lock for each bean...