Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

[Discuss] Avoid IEnumeration allocation (List+Dictionary) #4468

Closed
wants to merge 1 commit into from

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Apr 21, 2016

(For discussion)

1M enumerations from 1M objects at 40 MBytes to 5 objects at 1,224 bytes
Current impl takes 846ms and has 12 Gen0 collections (x3.25 slower than struct)
This impl takes 714ms and has 0 GC collections +15% faster than current (x2.7 slower than struct)
(Struct enumerator takes 260ms and has 0 GC collections)

The common IEnumerator interface foreach code

IList<int> list = new List<int>();
foreach (var value in list)
{
    // ...
}

Turns into something like

IEnumerable<int> enum = list.GetEnumerator();
try
{
    while (enum.MoveNext())
    {
        var value = enum.Current;
        // ...
    }
}
finally
{
    enum.Dispose();
}

Which boxes the struct enumerator to the interface and causes allocation.

This change introduces ObjectPool<T>; closely based on ArrayPool<T> - as an internal type (so as not to expose new api).

List<T> and Dictionary<TKey, TValue> use this shared pool for class enumerators which contain the struct enumerator. This allows no boxing/allocation (beyond the initial pool allocations); when the interface is used for List and Dictionary enumeration.

The enumerator is returned to the pool on the call to Dispose and the struct enumerator set to default(Enumerator).

Interface dispatch will still be slower than using the stuct enumerator however.

Resolves dotnet/roslyn#4446 (Measurements below)

/cc @rynowak, @stephentoub, @davidfowl, @jaredpar, @JonHanna

@benaadams
Copy link
Member Author

Allocations for https://gist.github.com/benaadams/d05000ae4b4935aa3e3b39ce47713a6d

var list = new List<int>();
var ilist = (IList<int>)list;

for (var i = 0; i < 100; i++)
{
    list.Add(i);
}

for (var i = 0; i < 1000000; i++)
{
    foreach(var value in ilist)
    {
        ;
    }
}

Before 1M objects at 40 MBytes

before allocs

After 5 objects at 1,224 bytes

after allocs

_objects = new T[maxPooled];
}

/// <summary>Takes an object from the pool. If the pool is empty, returns a new object.</summary>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the new object created?

Copy link
Member Author

@benaadams benaadams Apr 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its just new T(); I'm investigating something more generic with a lamda

...but so many angle brackets...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at line 68; obj ?? new T();

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, missed the ?? new T(). Costly due to the use of Activator.CreateInstance.

@mattwarren
Copy link

As the potential cost of new T() has come up a few times, here's a BenchmarkDotNet benchmark (full benchmark code):

BenchmarkDotNet=v0.9.4.0
OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz, ProcessorCount=8
Frequency=2630751 ticks, Resolution=380.1196 ns, Timer=TSC
HostCLR=MS.NET 4.0.30319.42000, Arch=32-bit RELEASE
JitModules=clrjit-v4.6.100.0

Type=Program  Mode=Throughput  Runtime=Clr LaunchCount=1  
Method Median StdDev Scaled
ComplexClassAlternative 86.8452 ns 6.4257 ns 13.17
ComplexClassRegular 8.8911 ns 0.0960 ns 1.35
SimpleClassAlternative 82.9110 ns 0.5299 ns 12.57
SimpleClassRegular 6.5960 ns 0.3710 ns 1.00

Note: Regular is new SomeClass() and Alternative is new T()

So there is a cost, but it's roughly 80 nanoseconds (I'm not sure whether that's a problem or not?)

@benaadams
Copy link
Member Author

@mattwarren and taking a parameter Func<T> which is () => new SomeClass()?

@mattwarren
Copy link

mattwarren commented Apr 21, 2016

@benaadams Not sure I'm following, could you update the gist (or leave a comment on it) with the code you want to test and I'll run it for you?

@mattwarren
Copy link

Here's the updated results, Func<T> which is () => new SomeClass() is much quicker:

BenchmarkDotNet=v0.9.4.0
OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz, ProcessorCount=8
Frequency=2630751 ticks, Resolution=380.1196 ns, Timer=TSC
HostCLR=MS.NET 4.0.30319.42000, Arch=32-bit RELEASE
JitModules=clrjit-v4.6.100.0

Type=Program  Mode=Throughput  Runtime=Clr  
LaunchCount=1  
Method Median StdDev Scaled
ComplexClassAlternative 84.5888 ns 0.6949 ns 13.12
ComplexClassFunc 20.4752 ns 0.0950 ns 3.18
ComplexClassRegular 8.8755 ns 0.1359 ns 1.38
SimpleClassAlternative 82.2324 ns 0.3058 ns 12.75
SimpleClassFunc 18.3899 ns 0.1852 ns 2.85
SimpleClassRegular 6.4488 ns 0.0486 ns 1.00

@benaadams
Copy link
Member Author

Changed the pool to take a Func<T> and Func<TState, T>, TState in rent.
Changed PooledIEnumerator to be a generic type rather than defining a type in List and Dictionary so its only the two methods in each that change. (Though they do end up having a horrible number of angle brackets)

Measuring perf now

return enumerator.Initalize(this);
var enumerator = ObjectPool<PooledIEnumerator<T, Enumerator, List<T>>>.Shared.Rent(
() => new PooledIEnumerator<T, Enumerator, List<T>>());
return enumerator.Initalize((list) => new Enumerator(list), this);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overly complicated. enumerator._enumerator = new Enumerator(list) would do the job.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good point!

@benaadams
Copy link
Member Author

benaadams commented Apr 21, 2016

Current impl takes 846ms and has 12 Gen0 collections (x3.25 slower than struct)
This impl takes 714ms and has 0 GC collections +15% faster than current (x2.7 slower than struct)
(Struct enumerator takes 260ms and has 0 GC collections)

@benaadams benaadams force-pushed the ienumerator-allocs branch from 1dc1cd2 to bed59a7 Compare April 21, 2016 15:58
@benaadams
Copy link
Member Author

benaadams commented Apr 21, 2016

Main issue remaining so far, is no one trusts people to stop using objects post-Dispose; and a finalizer would be too slow. Which is a problem for pooling.

@benaadams
Copy link
Member Author

Closing this as seen better solutions (e.g. @jaredpar's enumerator)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce IEnumerator allocation via interface
8 participants