ArC: use a separate lock for each bean in generated ContextInstances #37982

mkouba · 2024-01-02T15:30:32Z

fixes Upgrading from 3.5.3 to 3.6.1 ,3.6.3,3.6.4 Quakust test will not start with Thread Thread[vert.x-eventloop-thread-6,5,main] has been blocked for 2460 m #37958

I'd like to stress that I would still recommend users to avoid things like calling external services in the @PostConstruct/@PreDestroy callbacks. Very often it's not clear when exactly a bean will be instantiated and since the API is blocking an undesired behavior may occur.

CC @franz1981 - I've replaced the forEach() method with removeEach() which is more useful for the request context destruction. It would be great if you could run some perf benchmarks so that we can be sure that this PR does not cause any significant perf regression. Although I expect a perf drop because we must use a separate lock for each bean...

...projects/arc/processor/src/main/java/io/quarkus/arc/processor/ContextInstancesGenerator.java

Ladicek · 2024-01-03T12:18:45Z

LGTM, just wondering how [un]likely is it that we reach the method bytecode size limit. Have you checked?

mkouba · 2024-01-03T12:21:41Z

LGTM, just wondering how [un]likely is it that we reach the method bytecode size limit. Have you checked?

Nope. I only used your https://github.com/Ladicek/arc-crazybeans with 500 @ApplicationScoped beans to test the generated ContextInstances...

Ladicek · 2024-01-03T12:23:01Z

I would expect 500 beans to be fine, but wouldn't be sure about 5000 beans :-)

franz1981 · 2024-01-03T12:26:20Z

I'm still in vacation, but...do we really need any lock?
We cannot use atomic ref set release/get acquire?
In theory we just care about visibility, because we have nor defined any expected behaviour with concurrent updates...this will solve any potential problem, would improve performance, and guarantee visibility rules to work (at the expense of atomicity, which I don't still see why we should enforce).

mkouba · 2024-01-03T12:35:25Z

I would expect 500 beans to be fine, but wouldn't be sure about 5000 beans :-)

Hm, so 1500 beans is OK but somewhere above this number I get an org.objectweb.asm.MethodTooLargeException. But I believe that we would get the same even before this PR.

I can try to reduce the size of the relevant methods but honestly, I'd rather document to use quarkus.arc.optimize-contexts=false for this kind of "microservice" ;-).

Ladicek · 2024-01-03T12:36:12Z

IIUC, we need to guarantee (at least for normal scopes) that if a contextual instance is needed, exactly one is created, even if multiple threads ask for it at the "same" time.

mkouba · 2024-01-03T12:36:40Z

I'm still in vacation, but...do we really need any lock? We cannot use atomic ref set release/get acquire? In theory we just care about visibility, because we have nor defined any expected behaviour with concurrent updates...this will solve any potential problem, would improve performance, and guarantee visibility rules to work (at the expense of atomicity, which I don't still see why we should enforce).

It's not only about visibility. We need to make sure that exactly one bean instance is created.

franz1981 · 2024-01-03T13:26:02Z

@mkouba which translate in N getAndSet/compareAndSet/getVolatile based on the case, saving a MonitorExit-volatile store op, per instance (which I am checking now if is a volatile store as I fear).

Additionally it means saving memory too, because you can have N static final atomic ref field updaters saving any additional lock instance and a generic util method which just need to bass the atomic reference as param and perform the expected pattern. Additionally, by doing it right, we would preserve the same behaviour of CHM, while now we perform a lock/unlock while reading vs being lock free.
And that mean that forEach can still be lock-free, no need to lock anything

franz1981 · 2024-01-03T13:32:36Z

IIUC, we need to guarantee (at least for normal scopes) that if a contextual instance is needed, exactly one is created, even if multiple threads ask for it at the "same" time.

The allocation happen in the critical protected region, or we assume we can produce more then one but let just one to win and be set?
If the former we just need to have 2 states and a loop/fail fast depending the requirements.

eg

getVolatile and eventually CompareAndSet(none, alloc)
Loss: exit
Win: allocate bean and putRelease(bean)

In this way you still have a single volatile op

mkouba · 2024-01-03T13:33:40Z

@mkouba which translate in N getAndSet/compareAndSet/getVolatile based on the case, saving a MonitorExit-volatile store op, per instance (which I am checking now if is a volatile store as I fear).

Additionally it means saving memory too, because you can have N static final atomic ref field updaters saving any additional lock instance and a generic util method which just need to bass the atomic reference as param and perform the expected pattern. Additionally, by doing it right, we would preserve the same behaviour of CHM, while now we perform a lock/unlock while reading vs being lock free.

I cannot say I completely understand that "perf gibberish" above :D but compareAndSet() from AtomicReference and AtomicReferenceFieldUpdater is not usable here because we need to compute the bean instance lazily and only once...

franz1981 · 2024-01-03T13:34:46Z

On @#37982 (comment) I have written a potential protocol in 2 phases (claim + commit value) which allow lazy allocations of beans

Ladicek · 2024-01-03T13:42:37Z

IIUC, we need to guarantee (at least for normal scopes) that if a contextual instance is needed, exactly one is created, even if multiple threads ask for it at the "same" time.

The allocation happen in the critical protected region, or we assume we can produce more then one but let just one to win and be set? I'd the former we just need to have 2 states and a loop.

eg
1. getVolatile and eventually CompareAndSet(none, alloc)

2. Loss: exit

3. Win: allocate bean and putRelease(bean)

If I understand you correctly, the alloc thing is a sentinel that marks "some thread won a CAS and is in the process of creating an instance". This doesn't work, because CAS losers cannot simply exit (returning null I suppose). They must wait in one way or another. Maybe spinning wouldn't be so bad in this situation, but we're calling an external callback to create the instance, so I would be cautious about that.

franz1981 · 2024-01-03T13:47:15Z

@Ladicek if the guarantee is to return what ever is set or setting ourself, yep, the loop is the appropriate behaviour. Consider that in case the caller thread is virtual we can still accept using Thread.yield while waiting, which would work fine. Given that the allocation is not supposed to perform I/O (I hope!! Correct me please if I am wrong) it shouldn't be a big deal. In case of I/O we can still wait 50us each attempt (via LockSupport::parkNanos which doesn't care about interruption) that's the default Linux time slack ns (min granularity of sleep)

Ladicek · 2024-01-03T14:11:00Z

So if I understand you correctly, computeIfAbsent would look roughly like this:

Object value = this.1; // volatile read
if (value != null && value != SENTINEL) {
    return value;
}
if (CAS(this.1, null, SENTINEL) {
    value = instanceSupplier.get();
    this.1 = value; // volatile write
    return value;
} else {
    while (value == null || value == SENTINEL) {
        Thread.onSpinWait();
        value = this.1; // volatile read
    }
}

This could even be written as a static method in some util class, accepting a VarHandle or AtomicReferenceFieldUpdater, it wouldn't need to be generated with Gizmo.

I guess that's doable, but we'd have to make sure that getIfPresent and remove handle the presence of SENTINEL correctly.

I'm not sure how the bulk-processing methods (getAllPresent, clear / forEach or removeEach) would work. Clearing probably isn't too big of a deal, if someone accesses a context that is being destroyed, they are doing something wrong. Now, getAllPresent is used to do 2 things: create a Map<Bean<?>, Object and destroy an instance. The Map cannot possibly be "accurate at the end" in case of concurrent access, but we should easily be able to guarantee "accurate at the beginning", which should be enough. The usage of getAllPresent to destroy an instance is IMHO an oversight and should be reimplemented using remove.

Which leaves me wondering about spinning. I didn't look at what the suppliers do. I'll check.

franz1981 · 2024-01-03T14:16:32Z

Yep, the only suggestion is to replace the volatile store with a lazySet/setRelease, because it doesn't need any Store load barrier and still work fine, because it should be translated as

X = supplier.get();
LoadStore
StoreStore
Store X

The 2 barriers are a nop in x86 and are mostly compiler barriers which doesn't cost.
It will still guarantee safe publication of the value X even if it is not immutable.

If we don't do it, we risk to have the same exact performance hit of lock/unlock.
The beauty of this approach is that we can perform volatile reads now (assuming no concurrent allocations, or we have to spin till the value is available).

Related OOM or thrown exceptions: what we need to do while spin waiting is up to us, but it's a good practice to assume it could fail, unless we can guarantee the supplier get to never fail, which would make the provided code to be ok (apart from the volatile store point explained above).

Ladicek · 2024-01-03T14:18:08Z

OK so the supplier does 2 things:

Call Contextual.create()
Create a ContextInstanceHandleImpl

The second step is very cheap, and the first one is usually also fairly cheap, but in case of synthetic beans, Contextual.create() can do anything. So that's a bit worrying.

franz1981 · 2024-01-03T14:22:07Z

@Ladicek given that the provided util method is written just once, I think adding a local counter for the number of spin and then force a LockSupport::parkNanos(50000L) while exhausted, to be the safer choice. It would save burning too much CPU while waiting the value to be available.

Let's say that we are optimizing 2 possible but still degenerative use cases:

concurrent allocation
long allocation

They are kind of related, cause being slower increase the chance of concurrent attempts, but I would be surprised to see this as the common scenario

mkouba · 2024-01-03T14:48:46Z

OK so the supplier does 2 things:
1. Call `Contextual.create()`

2. Create a `ContextInstanceHandleImpl`
The second step is very cheap, and the first one is usually also fairly cheap, but in case of synthetic beans, Contextual.create() can do anything. So that's a bit worrying.

Also in case of #37958 it calls an external service...

mkouba · 2024-01-03T14:58:50Z

So I'm not against experimenting with the lock-less approach described above. But:

It should be discussed in a separate PR.
It's even less readable/maintainable. Which is OK since it's an implementation detail of a perf optimization. OTOH it means fewer people will be able to maintain this code.
In any case, I'd like to see some numbers before we make a sacrifice.

franz1981 · 2024-01-03T15:05:26Z

@mkouba
For

In any case, I'd like to see some numbers before we make a sacrifice.

Do you already have a micro benchmark with a non-trivial but still realistic number of beans?
If yes, we can just compare this PR as it is vs existing approach, and we can first evaluate what is the impact.
What we could extract from this is that we cut in half the number of volatile operations, meaning that it can just improve over the existing approach or have no impacts at all, but unlikely, worsen it (unless some programmatic error).

Just an hint: to make such microbenchmark more realistic, add a configurable parameter work (with 0, 10, 100 values) which can be used to call BlackHole.consumeCpu(work) in the benchmarking method, to simulate user work between atomic operations (or could be used inside the supplier too, to emulate longer bean creations, even better).

Ladicek · 2024-01-03T15:32:39Z

OK so the supplier does 2 things:
1. Call `Contextual.create()`

2. Create a `ContextInstanceHandleImpl`
The second step is very cheap, and the first one is usually also fairly cheap, but in case of synthetic beans, Contextual.create() can do anything. So that's a bit worrying.
Also in case of #37958 it calls an external service...

That's in @PostConstruct, that should be outside of the code path we're interested in here, right?

mkouba · 2024-01-03T15:37:58Z

OK so the supplier does 2 things:
1. Call `Contextual.create()`

2. Create a `ContextInstanceHandleImpl`
The second step is very cheap, and the first one is usually also fairly cheap, but in case of synthetic beans, Contextual.create() can do anything. So that's a bit worrying.
Also in case of #37958 it calls an external service...
That's in @PostConstruct, that should be outside of the code path we're interested in here, right?

I don't think so. A @PostConstruct callback is executed within Contextual#create(). @PostConstruct and @AroundConstruct interceptors will be called too.

Ladicek · 2024-01-03T15:39:25Z

Ouch, I didn't realize that. In that case, I'm against spinning, as we basically allow running arbitrary user code there.

gsmet · 2024-01-04T18:06:07Z

@mkouba we just got a second report of this causing issues, should we make the default false for next 3.6 micro and make further progress for 3.7?

manovotn · 2024-01-04T20:26:25Z

@mkouba we just got a second report of this causing issues, should we make the default false for next 3.6 micro and make further progress for 3.7?

I haven't had the time to fully delve into the details here but it seems it might be better to apply the fix suggested here and then continue iterating on it based on what Francesco and Ladislav wrote. My understading was that the current fix is not inherently wrong, just perhaps not optimal. Or did I miss something?

mkouba · 2024-01-05T07:49:42Z

My understading was that the current fix is not inherently wrong, just perhaps not optimal. Or did I miss something?

@manovotn That is my understanding as well ;-). At least we know that it fixes #37958 and #38040. My simple benchmarks do not show a significant regression either.

@mkouba we just got a second report of this causing issues, should we make the default false for next 3.6 micro and make further progress for 3.7?

@gsmet I'd rather get this fix in and backport to 3.6 if needed.

Of course, I'd like to continue with investigation in this area. First, we need some more benchmarks...

gsmet · 2024-01-05T08:25:27Z

@mkouba oh I think I wasn't very clear: I completely agree we should pursue with the fix. My only worry is that it might be too complex to backport thus why I proposed to switch the default in 3.6.

But maybe it's better to push the fix to 3.6 to have some more bake time before the LTS.

Let me know what you prefer.

- fixes quarkusio#37958 and quarkusio#38040 - use a separate lock for each bean in the generated ContextInstances - replace ContextInstances#forEach() and ContextInstances#clear() with ContextInstances#removeEach() - optimize the generated ContextInstances to significantly reduce the size of the generated bytecode

mkouba · 2024-01-05T09:01:45Z

My only worry is that it might be too complex to backport thus why I proposed to switch the default in 3.6.

@gsmet Would it help if we prepare a separate branch for 3.6?

Ladicek

LGTM, just wondering if this thing fell through the cracks:

I can try to reduce the size of the relevant methods but honestly, I'd rather document to use quarkus.arc.optimize-contexts=false for this kind of "microservice" ;-).

mkouba · 2024-01-05T09:38:29Z

LGTM, just wondering if this thing fell through the cracks:

I can try to reduce the size of the relevant methods but honestly, I'd rather document to use quarkus.arc.optimize-contexts=false for this kind of "microservice" ;-).

It didn't but I cannot find a good place in the docs. I mean, currently we don't generate the docs for the optimize-contexts config property (#36626 (comment)) 🤷

manovotn

I tried to go through this mainly by comparing the generated ContextInstances class before and after this code change and I think it looks good.
Given that it fixes the two reported scenarios, I'd get this merged and discuss further optimizations in separate PRs/issues.

gsmet · 2024-01-05T10:10:50Z

@mkouba I will try to backport it once it's merged and will let you know if we need a specific branch. No need to do the work if it's not required.

mkouba · 2024-01-05T10:29:03Z

@mkouba I will try to backport it once it's merged and will let you know if we need a specific branch. No need to do the work if it's not required.

There is a record in this PR so I'm prettty sure it won't be that easy :-(

gsmet · 2024-01-05T10:30:10Z

Ah, yes so we will need a separate branch. FWIW, I would be a bit cautious about using Java 17 features in fixes for now.

mkouba · 2024-01-05T10:49:54Z

@gsmet FYI https://github.com/mkouba/quarkus/tree/issue-37982-36. It contains 2 commits (from this PR and #37529)

quarkus-bot · 2024-01-05T12:55:03Z

✔️ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

gsmet · 2024-01-05T15:44:23Z

@mkouba can you create a PR so that I get it on my radar and we can get CI to run?

Thanks!

gsmet · 2024-01-09T16:42:04Z

This has been backported as part of #38069 .

Ladicek · 2024-01-10T09:07:12Z

LGTM, just wondering if this thing fell through the cracks:

I can try to reduce the size of the relevant methods but honestly, I'd rather document to use quarkus.arc.optimize-contexts=false for this kind of "microservice" ;-).

It didn't but I cannot find a good place in the docs. I mean, currently we don't generate the docs for the optimize-contexts config property (#36626 (comment)) 🤷

Just got this silly idea this morning: how about we disable this optimization automatically in case there's more than, say, 1000 beans in the application? It could be per context, but I think globally would be fine too.

mkouba · 2024-01-10T09:30:34Z

LGTM, just wondering if this thing fell through the cracks:

I can try to reduce the size of the relevant methods but honestly, I'd rather document to use quarkus.arc.optimize-contexts=false for this kind of "microservice" ;-).

It didn't but I cannot find a good place in the docs. I mean, currently we don't generate the docs for the optimize-contexts config property (#36626 (comment)) 🤷

Just got this silly idea this morning: how about we disable this optimization automatically in case there's more than, say, 1000 beans in the application? It could be per context, but I think globally would be fine too.

That is actually a good idea. And 1000 is a nice number. Hm, maybe we could change the type of the property to string and do something similar to ArcConfig.removeUnusedBeans. The set of supported values would be true, false and auto. Where if the auto value is used the optimization would be disabled if more than 1000 beans is found. WDYT?

Ladicek · 2024-01-10T09:41:20Z

That makes sense to me.

manovotn · 2024-01-10T09:42:13Z

Just got this silly idea this morning: how about we disable this optimization automatically in case there's more than, say, 1000 beans in the application? It could be per context, but I think globally would be fine too.

That's great idea! +1

mkouba · 2024-01-10T10:57:20Z

For the record: #38121

quarkus-bot bot added the area/arc Issue related to ARC (dependency injection) label Jan 2, 2024

mkouba force-pushed the issue-37958 branch from 728ed08 to d4f40d5 Compare January 3, 2024 11:25

mkouba marked this pull request as ready for review January 3, 2024 11:26

mkouba requested a review from Ladicek January 3, 2024 11:27

Ladicek reviewed Jan 3, 2024

View reviewed changes

...projects/arc/processor/src/main/java/io/quarkus/arc/processor/ContextInstancesGenerator.java Outdated Show resolved Hide resolved

mkouba force-pushed the issue-37958 branch from d4f40d5 to 5e6df4b Compare January 3, 2024 13:23

mkouba added this to the 3.7 - main milestone Jan 3, 2024

mkouba mentioned this pull request Jan 5, 2024

Thread locked with optimize-contexts and @Startup bean #38040

Closed

mkouba force-pushed the issue-37958 branch from 5e6df4b to d70bb28 Compare January 5, 2024 09:00

Ladicek approved these changes Jan 5, 2024

View reviewed changes

manovotn approved these changes Jan 5, 2024

View reviewed changes

mkouba added the triage/waiting-for-ci Ready to merge when CI successfully finishes label Jan 5, 2024

gsmet added the triage/backport? label Jan 5, 2024

mkouba merged commit 6406376 into quarkusio:main Jan 5, 2024

quarkus-bot bot added kind/bugfix and removed triage/waiting-for-ci Ready to merge when CI successfully finishes labels Jan 5, 2024

michalvavrik mentioned this pull request Jan 7, 2024

CDI request scope deactivation does not work on duplicated context #37741

Closed

gsmet removed the triage/backport? label Jan 9, 2024

ArC: use a separate lock for each bean in generated ContextInstances #37982

ArC: use a separate lock for each bean in generated ContextInstances #37982

Conversation

mkouba commented Jan 2, 2024 • edited Loading

Ladicek commented Jan 3, 2024

mkouba commented Jan 3, 2024

Ladicek commented Jan 3, 2024

franz1981 commented Jan 3, 2024 • edited Loading

mkouba commented Jan 3, 2024

Ladicek commented Jan 3, 2024

mkouba commented Jan 3, 2024

franz1981 commented Jan 3, 2024 • edited Loading

franz1981 commented Jan 3, 2024 • edited Loading

mkouba commented Jan 3, 2024

franz1981 commented Jan 3, 2024 • edited Loading

Ladicek commented Jan 3, 2024

franz1981 commented Jan 3, 2024 • edited Loading

Ladicek commented Jan 3, 2024

franz1981 commented Jan 3, 2024 • edited Loading

Ladicek commented Jan 3, 2024

franz1981 commented Jan 3, 2024 • edited Loading

mkouba commented Jan 3, 2024

mkouba commented Jan 3, 2024

franz1981 commented Jan 3, 2024 • edited Loading

Ladicek commented Jan 3, 2024

mkouba commented Jan 3, 2024

Ladicek commented Jan 3, 2024

gsmet commented Jan 4, 2024

manovotn commented Jan 4, 2024

mkouba commented Jan 5, 2024

gsmet commented Jan 5, 2024

mkouba commented Jan 5, 2024

Ladicek left a comment

Choose a reason for hiding this comment

mkouba commented Jan 5, 2024

manovotn left a comment

Choose a reason for hiding this comment

gsmet commented Jan 5, 2024

mkouba commented Jan 5, 2024

gsmet commented Jan 5, 2024

mkouba commented Jan 5, 2024

quarkus-bot bot commented Jan 5, 2024

gsmet commented Jan 5, 2024

gsmet commented Jan 9, 2024

Ladicek commented Jan 10, 2024

mkouba commented Jan 10, 2024

Ladicek commented Jan 10, 2024

manovotn commented Jan 10, 2024

mkouba commented Jan 10, 2024

mkouba commented Jan 2, 2024 •

edited

Loading

franz1981 commented Jan 3, 2024 •

edited

Loading

franz1981 commented Jan 3, 2024 •

edited

Loading

franz1981 commented Jan 3, 2024 •

edited

Loading

franz1981 commented Jan 3, 2024 •

edited

Loading

franz1981 commented Jan 3, 2024 •

edited

Loading

franz1981 commented Jan 3, 2024 •

edited

Loading

franz1981 commented Jan 3, 2024 •

edited

Loading

franz1981 commented Jan 3, 2024 •

edited

Loading