RFC: Optimize AbstractFuture.setFuture #3370

njhill · 2019-01-29T01:40:01Z

Chained/recursive async ListenableFuture transformations (e.g. via Futures.transformAsync/catchingAsync/scheduleAsync/... or SettableFuture.setFuture) currently cause indefinite growth of live objects, with intermediate futures not eligible for collection until the entire chain/tree is completed, even if/when already set asynchronously. This PR is a proof-of-concept experiment to rework the setFuture mechanics so that such chains are collapsed on the fly where possible, and makes some related optimizations along the way. It needs some further refinement but should be fully functional.

High-level overview

The approach is motivated by the observation that chains (or more generally trees) of AbstractFutures, previously linked via setFuture calls, form a disjoint group whose members share a common fate. Although a tree structure is implied by the order in which the futures were combined, only identification of the root has any significance w.r.t. future program state. In other words the group can be considered as a set with no structure apart from the existence of a unique "unset" member. And since listeners and waiters added to any of the futures in the group are interested only in the common outcome, there's no need for them to remain tied to their "local" future instance.

If the future overrides afterDone() then it must remain reachable, i.e. there is no hope of unpinning it. Unfortunately this applies to most of the internal AbstractFuture subclasses used to implement the Futures methods, but it turns out they can be refactored to use a (newly introduced) weaker hook named afterEarlyCancellation(). This gets called if and only if the future is cancelled prior to being set asynchronously.

AbstractFutures which make use of setFuture() but don't override afterDone() are considered to be "passively completable" - they can be manipulated such that from the outside they appear to behave/complete as normal, but no callback references need to be retained to them once they are set asynchronously. This is achieved by relocating/redirecting their Listeners and Waiters elsewhere.

Passive completability is currently signalled by overriding a new protected requiresAfterDoneCallback() method to return false (it returns true by default). There may be a better approach like a marker interface or new abstract AbstractFuture subclass or even automatic determination via reflection, though the cost of the latter may be prohibitive.

Internal changes

The private SetFuture class currently serves as an immutable link to the next future, pointing to both the source and target, and is also registered as a listener on the target. This has been changed to serve a different purpose, now a mutable, shared placeholder set as the value for multiple AbstractFutures belonging to the same async-set group. It still has two fields - a fixed (final) pointer to an arbitrary "delegate" member of the group (chosen when first established), and a volatile pointer to the group's current unset future*.

A new future "joins" the group when an existing member is successfully passed to its setFuture method or is passed itself as the argument to the setFuture method of the group's current unset target. In either case the group's existing SetFuture instance becomes the value of the newly joined AbstractFuture. If the new future is passively completable, its listener stack is moved to the group's delegate future (replaced by a new MOVED Listener sentinel), which is guaranteed to be explicitly completed. Waiters in this scenario are released, will recheck their owning future's value finding it is now a SetFuture, and then just block on SF.delegate.get().

A new private Pending class represents the same unset state as null but is set as the value of the group's single unset (innermost) member and holds a pointer to that group's SetFuture object. When setFuture is called on this unset future the Pending value is unwrapped, replaced with the SetFuture object it contains. If trusted, the new future's value is set to the same Pending object (it moves down the chain). The whole operation involves no new allocation if the futures are passively completable.

A listener is typically no longer created when passively completable futures are set asynchronously, however the design requires that the assigned "delegate" future for the group is explicitly completed whether or not it is passively completable. This is easily arranged since when the innermost future is finally (non-asynchronously) set, this delegate can be found immediately via the Pending object which the terminal value replaces. If the innermost future is async-set to a non-trusted future, the Pending object is discarded and the contained SetFuture is added as a listener to the target (it continues to implement Runnable for this reason).

Before:

After:

Completion "callbacks" are still required for non-passively completable async-set futures (apart from those which happen to be a SetFuture delegate). A new private Completer class is used for this purpose - a simple Runnable pointing back to the future in question.

As well as benefiting allocations and object retention, this approach reduces the total number of volatile writes and CAS operations required to process a given chain of futures.

(*) For clarity the above explanation ignores the special case of joining two existing async-set groups that occurs if the unset member of a non-singleton async-set group is set to a future already belonging to a different (necessarily disjoint) non-singleton async-set group. Logically the outcome is just a single async-set group formed by their union, whose unset member is that of the second one. But special handling is required in the implementation since we now have two SetFutures, neither of which can be discarded: one of them is "retired" and added as a listener to the other, which means that its target value will no longer be updated when the aggregate async-set group's target changes. However the future which the target is set to will either be updated or (if there are futher group compositions) lead transitively via a chain of SetFutures to one which is updated. Thus the new impl can still result in chains of async futures but they should be much rarer and shorter, since each link in such a chain must have resulted from a setFuture composition of non-singleton async-set groups of futures.

Futures.addCallback()

A small modification is also included to avoid listeners added via Futures.addCallback() continuing to hold a ref to their target future once it has been async-set - the Futures$CallbackListener.future field is changed to be non-final (safe since it's published via CAS to the listeners stack), and is updated to the group's delegate future at the time the listener is redirected or relocated by AbstractFuture.

Results

The following exaggerated example takes more than 2 minutes to fail with an OutOfMemoryError after ~14 million iterations when run with the current version and -Xms1g -Xmx1g. With this PR and -Xms32m -Xmx32m it completes 100 million iterations successfully in less than 4 seconds:

public static void main(String[] args) {
  int reps = 100_000_000;
  SettableFuture<String> orig = SettableFuture.create(), prev = orig;
  for (int i = 0; i < reps; i++) {
    SettableFuture<String> curr = SettableFuture.create();
    prev.setFuture(curr);
    prev = curr;
  }
  prev.set("done"); // prev represents the 'innermost' future
  assert orig.isDone();
}

There is negligible if any change to the existing caliper benchmark numbers (both runtime and allocations), but a new benchmark for testing performance of setFuture and subsequent completion based on the above snippet shows considerable improvement:

Impl (median values)	Alloc objects	Alloc bytes	Runtime nanos
Current	3.00	72.00	98.09
Current Trusted	3.00	72.00	94.99
New	3.00	72.00	82.38
New Trusted+Passive	1.03	24.68	34.76

Behaviour differences

Cancellation of async-set groups of futures is less racy / more consistent, with a guarantee that either all or none of them will be cancelled. This ensures above assertions about shared fate hold true, which isn't technically the case in the current impl since cancellation completions propagate along chains in one direction and all other kinds of completion in the other.
toString() output no longer includes intermediate async-set futures. A justification could be that the omitted information is merely "historical" and doesn't pertain to the future's current state or later outcome.
Cancel-with-interruption semantics while still "consistent" differ a bit from the current rules. A simple way to describe the new behaviour is to say that asynchronously-set futures cannot themselves be interrupted. Calling cancel(true) on them will still cancel the future(s) to which their fate is now bound (and any whose fate depends on them), but only the innermost (unset) future will have its wasInterrupted flag set and interruptTask() method invoked (if it's an AbstractFuture). This behaviour aligns nicely with the existing Futures internal impls and feels reasonable given that it arguably doesn't make sense to "interrupt" a future which is already set asynchronously since it's now essentially acting as a proxy for the future it was async-set to. It's also the case that although now corrected to be consistent, the current interruption handling is necessarily somewhat arbitrary because there's no way to determine the interrupted state of a third-party cancelled future. Java's CompletableFutures ignore mayInterruptIfRunning entirely.

Further simplifications considered

Initialize the internal value field to UNSET, defined as a static constant new Pending(null) and have null represent itself. This elminiates the NULL constant and simplifies various logic (checking for unset can be just value instanceof Pending rather than value == null | value instanceof Pending), but negatively impacted construction speed, at least on x86_64 linux as measured by the caliper benchmarks.
Replace the SetFuture class by the delegate future itself, with the delegate's value set to the group's current target ListenableFuture. Use a simple wrapper to differentiate terminal values which are ListenableFutures themselves (assumed to be less common). This turned out to also negatively impact general performance due to the fact that instanceof SomeInterfaceWithManyImpls (e.g. ListenableFuture) is greater than an order of magitude more expensive than instanceof SomeFinalClass (e.g. SetFuture).
Instead of the new afterEarlyCancellation hook, add one called afterSet which is called once, after the future is first set (either finally or async). This seems clearer/cleaner API-wise but at least in the case of the internal futures impls, some of the required logic is specific to the early cancellation case (for example this is the only one requiring possible propagation to the delegate) so fewer conditional checks end up being needed.

I have not tried this yet but expect further optimization is possible to Listener handling, in particular changing Listener to an interface or abstract class which the "internal" callbacks can extend directly to eliminate some allocation and indirection, and bypass the public addListener method when moving listeners between TrustedFutures.

njhill · 2019-02-01T23:07:30Z

@cpovirk any thoughts on the value of this kind of change?

btw the build failure is unrelated and affecting all branches - caused by travis moving to jdk11.0.2 which appears to have an incompatibility with maven-javadoc-plugin (at least until next version 3.1.0 is released).

cpovirk · 2019-02-04T17:26:33Z

Thanks. @lukesandberg, author of most of AbstractFuture, may have some thoughts. If not, I will set myself some reminders so that this doesn't fall off my radar.

Optimize handling of chained futures w.r.t. both CPU and allocations; allow GC of intermediate futures

AbstractFutureBenchmarks "old" impl changed to match the "current" impl (pre setFuture optimizations)

njhill · 2019-03-13T23:38:57Z

@lukesandberg @cpovirk curious if you had a chance to take a look at this?

cpovirk · 2019-03-20T15:26:57Z

Not yet, sorry :( Various non-Guava stuff has been eating my time even more than usual lately.

njhill · 2019-04-03T19:48:10Z

@cpovirk @lukesandberg I realize that the size of the change is likely off-putting, but I wonder if you could at least read just the overview, results and possibly "behaviour differences" sections above to determine if this might even be of interest. If the answer is no then there's no need to look at the code!

googlebot added the cla: yes label Jan 29, 2019

njhill force-pushed the future-rework branch 2 times, most recently from adafd98 to dd43c90 Compare January 29, 2019 04:26

cgdecker requested a review from cpovirk January 29, 2019 20:09

cgdecker added the status=triaged label Jan 29, 2019

njhill added 2 commits March 13, 2019 15:29

Proof of concept: Rework AbstractFuture.setFuture mechanics

9cdbc10

Optimize handling of chained futures w.r.t. both CPU and allocations; allow GC of intermediate futures

Add timeSetFutures_Normal benchmark; update OldAbstractFuture

6078d52

AbstractFutureBenchmarks "old" impl changed to match the "current" impl (pre setFuture optimizations)

njhill force-pushed the future-rework branch from 3a30d70 to 6078d52 Compare March 13, 2019 22:32

This was referenced May 17, 2019

Upgrade Maven Javadoc Plugin to 3.1.0 #3476

Closed

Upgrade Maven Javadoc Plugin from 3.0.1 to 3.1.0 #3478

Closed

netdpb added P3 no SLO package=concurrent type=performance Related to performance labels Oct 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Optimize AbstractFuture.setFuture #3370

RFC: Optimize AbstractFuture.setFuture #3370

njhill commented Jan 29, 2019

njhill commented Feb 1, 2019

cpovirk commented Feb 4, 2019

njhill commented Mar 13, 2019

cpovirk commented Mar 20, 2019

njhill commented Apr 3, 2019

RFC: Optimize AbstractFuture.setFuture #3370

Are you sure you want to change the base?

RFC: Optimize AbstractFuture.setFuture #3370

Conversation

njhill commented Jan 29, 2019

High-level overview

Internal changes

Futures.addCallback()

Results

Behaviour differences

Further simplifications considered

njhill commented Feb 1, 2019

cpovirk commented Feb 4, 2019

njhill commented Mar 13, 2019

cpovirk commented Mar 20, 2019

njhill commented Apr 3, 2019