-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execution-despatch optimization (reducing heap allocations) #271
Comments
@markrendle Thanks for the really detailed engagement! No criticism taken - the biggest challenge with Polly is simply getting through all the ideas we have. Fair to say that the main focus since App-vNext took over has been new features ... there are gains that can be made in perf for sure (aware of things that it would be good to squash out/change from the inherited codebase), so thanks for the reminder - and offer of help! Quick response today - more very soon. Briefly, I think we should consider this both internally (where Polly may unwantedly use closures internally that we can eliminate); and externally (where executing a user delegate that wants input parameters currently requires those params to be passed in by closure ... eliminating that). Had you been focusing more on one or the other, or both? On the public API in particular, I want to reflect on and surface a few options for how that could look. |
@reisenberger Cool cool cool. I was considering both aspects, the internal and external code. For the user-facing API, it should be as simple as offering overloads of the various methods that take delegates with additional parameters, and arguments for those parameters. My only slight concern there is the interaction with the existing overloads that take a Context and CancellationToken, but my brief sandboxing suggests it should be reasonably straightfoward and the C# compiler will hit the right overloads. The biggest issue there is you get to the point where you might want to code-gen the generic parameter overloads, from On the internal code I think it's more of a hunt-and-pick process, once the basic library code is in place. Like I say, I'm very happy to pitch in PRs if issues are opened with up-for-grabs labels. |
(triage of open issues) This is not forgotten. TL;DR related work is happening as part of/tied in with wider changes to Polly.
For ref for anyone interested in performance: we benchmarked Polly throughput at Polly v5.1 at around a million operations/second for all policy types. (IE, using a Policy adds ~1 microsecond to your call.) We can continue to use those benchmarks and post stuff in the #performance channel on slack as things evolve. |
Add structs which hold an executable action, and its input parameters, for all Policy executions. Allows us to remove the allocations generated by closures, in all Polly executions. See App-vNext#271
Here's a similar example I came across. I believe this would be better served as a public static RetryPolicy<TResult> WaitAndRetry<TResult>(this PolicyBuilder<TResult> policyBuilder, ...)
{
Action<DelegateResult<TResult>, TimeSpan, int, Context> doNothing = (_, __, ___, ____) => { };
return policyBuilder.WaitAndRetry(retryCount, sleepDurationProvider, doNothing);
} Should be: public static RetryPolicy<TResult> WaitAndRetry<TResult>(this PolicyBuilder<TResult> policyBuilder, ...)
{
return policyBuilder.WaitAndRetry(retryCount, sleepDurationProvider, DoNothing);
}
private static void DoNothing(…) { } |
@grant-d Thank you for the extra suggestion about allocations here. Benchmarking identified that in fact the opposite applied: a Thanks again for everything you've contributed to Polly! (including the wait-and-retry helpers) |
PR #748 delivers on the main recommendations of this issue, and will be released as part of Polly v8.0.0 Allocations per execution reduce as follows: (stats from BenchmarkDotNet) Sync executions
*The 72 bytes represent the Async executionsAsync executions inevitably carry lightly greater allocations:
Significant savings have nonetheless been achieved. Polly uses async/await-elision internally wherever possible to reduce the extra
These statistics represent allocations attributable to execution despatch, calculated with LatenciesWith these changes order-of-magnitude of execution latency has been maintained at ~1microsecond (one-millionth-second) extra latency per Polly policy applied to an execution, for all policy types. |
Polly v8.0.0 will also add new execute overloads allowing passing strongly-typed input objects to delegates to be executed, without using a closure: void Execute<T1>(Action<Context, CancellationToken, T1> action, Context context, CancellationToken cancellationToken, T1 input1)
void Execute<T1, T2>(Action<Context, CancellationToken, T1, T2> action, Context context, CancellationToken cancellationToken, T1 input1, T2 input2)
TResult Execute<T1, TResult>(Func<Context, CancellationToken, T1, TResult> func, Context context, CancellationToken cancellationToken, T1 input1)
TResult Execute<T1, T2, TResult>(Func<Context, CancellationToken, T1, T2, TResult> func, Context context, CancellationToken cancellationToken, T1 input1, T2 input2)
Task ExecuteAsync<T1>(Func<Context, CancellationToken, bool, T1, Task> action, Context context, CancellationToken cancellationToken, bool continueOnCapturedContext, T1 input1)
Task ExecuteAsync<T1, T2>(Func<Context, CancellationToken, bool, T1, T2, Task> action, Context context, CancellationToken cancellationToken, bool continueOnCapturedContext, T1 input1, T2 input2)
Task<TResult> ExecuteAsync<T1, TResult>(Func<Context, CancellationToken, bool, T1, Task<TResult>> func, Context context, CancellationToken cancellationToken, bool continueOnCapturedContext, T1 input1)
Task<TResult> ExecuteAsync<T1, T2, TResult>(Func<Context, CancellationToken, bool, T1, T2, Task<TResult>> func, Context context, CancellationToken cancellationToken, bool continueOnCapturedContext, T1 input1, T2 input2) (and similar |
Breaking changes: Custom policies hosted outside Polly will need to make a couple of small adjustments (method naming, change of execution-despatch type) to dovetail with the new execution despatch. Details to be summarised in full for the v8.0.0 release. |
with the PR merged, should this be closed? or is there more work to be done? and can i help? |
I think this is still open pending Polly v8 (whenever that happens - it's pretty much on-ice during Dylan's absence). |
seems other issues in the v8 milestone are closed when fixed https://github.com/App-vNext/Polly/issues?q=is%3Aclosed+milestone%3Av8.0.0 |
Hey folks, this is addressed in V8 where the
@martincostello , I think we can close this one. cc @SimonCropp |
Yep, I think agree - v8 should resolve this scenario. |
Hi. I've been looking at the internal source code and thinking about optimization, in the spirit of new .NET low-allocations and everything. This is not intended as criticism (Polly is awesome), just a suggestion and an offer to contribute some work.
Anyway.
The way Polly currently handles work-to-be-executed is, at the simplest level, as a
Func<TResult>
. Any state necessary for the operation must therefore be provided as closures, incurring heap allocations. There's a move in some of the Core projects towards instead passing arguments along with delegates to avoid these closure allocations, so I thought I'd have a go at that in the context of Polly. So, for example, instead ofyou would have
Except it turns out that's a bit of a bugger, because in the internal code you have to multiply all the internal engines' Execute and ExecuteAndCapture by however many overloads you have (
Func<T1,TResult>
,Func<T1,T2,TResult>
, etc), and that's not fun for anybody.So then I thought about wrapping the delegate and its argument(s) in a struct which implemented a single, consistent interface, like this:
Then all the Execute and ExecuteAndCapture calls can just take an instance of that interface:
Except, that also causes allocations because the struct gets boxed to the
IPollyAction<TResult>
interface.But if you change the
Execute
method to this:then the generic method acts on the struct and no boxing, and hence no GC allocation, occurs. The only downside is that the call to this method has to explicitly specify the generic type parameters because C#'s overload inference doesn't recurse, but it's not too bad and will be hidden inside the library code.
I threw together a very quick BenchmarkDotNet comparing this approach to the current closure one:
Using the struct wrapper eliminates heap allocations and cuts execution time by >50% just for this simple example. Polly currently creates multiple closures per-call.
Question is, does this matter enough, and are the maintainers interested in this level of optimization? If it does and they are, I'm happy to spike something more representative, like redoing RetryPolicy to use this approach, to enable further discussion.
The text was updated successfully, but these errors were encountered: