Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Alternative System.Lazy implementation #8963

Merged
merged 17 commits into from
Mar 4, 2017

Conversation

manofstick
Copy link

@manofstick manofstick commented Jan 17, 2017

2017-21-01: I have modified the implementation a little, so updated the text to reflect the changes
2017-28-01: Updated text to represent current state of implementation

The current version of System.Lazy isn't particularly performant. @mrange had some analysis in an article called On the cost of being lazy.

Ultimately this stemmed back to FSharp's implementations of Seq.init(Infinite)? which wrapped each call to IEnumerator<T>.MoveNext() in a lazy that was only evaluated on Current. (Which is one of the issues that I have address when composing Seqs in my (incomplete) PR Seq Composer - although it still remains in there under some circumstances)

This version:

  • creates less objects for ExecutionAndPublish (2 objects created, vs 3), None (1 vs 2) & PublicationOnly (1 vs 2)
  • Faster access after initialization (just a null check, no casting)
  • Faster creation step - just one virtual call away
  • Arguably a more straight-forward implementation that is easier to follow as logic is partitioned.
  • Much less generics exposure, so as to improve JIT compilation

Still to do:

  • Add and merge your extra tests to for Lazy<T> to corefx repo. (System.Lazy regression tests corefx#15577)
  • Make sure that all corefx tests for Lazy<T> are passing
  • Incorporate any feedback/bug fixes
  • Merge to CoreCLR repo
  • Port the change to CoreRT repo
  • External (not @manofstick) validation of threading correctness
  • Solution for how to deal with Mode on complete item for Debug object (I suggest Nullable, if that is acceptable)

@dnfclas
Copy link

dnfclas commented Jan 17, 2017

Hi @manofstick, I'm your friendly neighborhood .NET Foundation Pull Request Bot (You can call me DNFBOT). Thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes. I promise there's no faxing. https://cla2.dotnetfoundation.org.

TTYL, DNFBOT;

@dnfclas
Copy link

dnfclas commented Jan 17, 2017

@manofstick, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR.

Thanks, DNFBOT;

@mrange
Copy link

mrange commented Jan 17, 2017

@manofstick - I have updated the charts to include your updated Lazy<'T> class.

https://gist.github.com/mrange/aa82d33e94ad76d0f33ed86e704d7492


private static class CreateInstance
{
private static T construct()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment: Construct?

/// </summary>
private static object GetObjectFromMode(LazyThreadSafetyMode mode)
/// <param name="factory">The object factory used to create the underlying object</param>
/// <param name="forceMemoryBarrier">true when called with ExecutionAndPublication, false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: No parameter named forceMemoryBarrier?

private T CreateValuePublicationOnly(Func<T> factory, PublicationOnly comparand)
{
var possibleValue = factory();
lock (comparand)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I find a bit confusing that the lock for PublicationOnly happens inside the Create* method but for ExecutionAndPublication the caller holds the lock. Is there a reason the lock can't be moved into CreateValueExecutionAndPublication?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with the asymmetry with the lock, but I need to call TakeFactory to null out the factory object in the case of ExecutionAndPublication. Anyway I have a plan to remove the lock from PublicationOnly, so that should make it better!

/// <returns>The underlying object</returns>
private T CreateValuePublicationOnly(Func<T> factory, PublicationOnly comparand)
{
var possibleValue = factory();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping that PublicationOnly only could get away for with an Interlock.CompareAndExchange over a full lock. What are the reasons a full lock is needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've shamed me into it! I've thought of a cunning plan... soon, soon...


// Fall through to the slow path.
return LazyInitValue();
var implementation = m_implementation;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you think it's worth to add aggressive inlining to the performance critical property getters?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... that ones above my pay grade :-) I'm guessing that the JIT is probably inlining the calls to Value anyway? Maybe you could check?

protected Func<T> TakeFactory()
{
var factory = Factory;
if (!ReferenceEquals(CreateInstance.Factory, factory))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this special handling needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TakeFactory is required to handle re-enterency. In the case of recursive initialization it throws an exception. Needed to replicate existing functionality.

m_valueFactory = ALREADY_INVOKED_SENTINEL;
if (factory == null)
throw new InvalidOperationException(Environment.GetResourceString("Lazy_Value_RecursiveCallsToValue"));
Factory = null;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right in that this handling is only correct if only one thread is working or we have execution and publication lock?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's on the money.


public override T Value
{
get { return Owner.CreateValuePublicationOnly(Factory, this); }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No TakeFactory here I guess because it's not safe for concurrent calls? How is reentrancy detection handled for PublicationOnly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't. It hangs! But same functionality as previous implementation.

var possibleValue = factory();
lock (comparand)
{
if (ReferenceEquals(m_implementation, comparand))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we are comparing against a comparand is that during evaluation of the lazy value we might get an exception? That is captured in an object that is referenced by m_implementation. If we had no need to capture exceptions we could have compared against null?

[NonSerialized]
private object m_threadSafeObj;
// m_implementation, a volatile reference, is set to null after m_value has been set
private volatile ILazyItem m_implementation;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the current implementation either holds a lock when updating m_implementation or we have no protection ==> Lazy<'T> can't be used concurrently then the only place where volatile matters is in the .ctor to ensure all writes to ILazyItem objects are visible to other threaads when they read it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure of your comment there? (Or is it a comment that is meant to be a comment? even then I don't think it reads very well?)

The volatile matters in all places where m_implementation is accessed. i.e. it is the gatekeeper to ensure that m_value has a valid value before it is accessed. As reference type writes are atomic, we're guaranteed that we'll either have a valid ILazyItem implementation (from the .ctor) or null, after m_value has been written. (I'm about to make that a little more complex...)

@@ -50,63 +42,22 @@ internal static class LazyHelpers
[ComVisible(false)]
[DebuggerTypeProxy(typeof(System_LazyDebugView<>))]
[DebuggerDisplay("ThreadSafetyMode={Mode}, IsValueCreated={IsValueCreated}, IsValueFaulted={IsValueFaulted}, Value={ValueForDebugDisplay}")]
public class Lazy<T>
public class Lazy<T> :
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @manofstick, I have checked the code and added a few observations/questions. Overall I see no obvious weakness. You said there was no test-coverage for Lazy<'T> today. Can that be added (damn difficult for these kinds of classes but at least to cover the functionality)?

@mrange
Copy link

mrange commented Jan 17, 2017

Hi @manofstick, I have checked the code and added a few observations/questions. Overall I see no obvious weakness. You said there was no test-coverage for Lazy<'T> today. Can that be added (damn difficult for these kinds of classes but at least to cover the functionality)?

@mrange
Copy link

mrange commented Jan 17, 2017

I think the reason the Lazy<'T> implementation above performs worse for PublicationOnly then Lazzzy<'T> is because of the publication only lock.

@mrange
Copy link

mrange commented Jan 17, 2017

From my performance testing the read overhead of a Lazy<'T> object that is initialized is 3x lower with the new implementation. That is pretty good!

@stephentoub
Copy link
Member

I've not reviewed these changes yet, but there are functional tests for Lazy; as with most of the types in mscorlib, the full set of tests is in corefx, e.g.
https://github.com/dotnet/corefx/blob/master/src/System.Runtime/tests/System/LazyTests.cs

This involves a little bit of trickery. In avoiding the lock we need to
go from m_implementation as PublicationOnly to null in a single step. We
can't do this though without another object, so we reuse out Lazy
object, which basically just spins its wheels until the value is ready.
This shouldn't be a very common scenerio (i.e. hitting that code path)
but it is possible.
@manofstick
Copy link
Author

@mrange

OK; with a bit of trickery I restored the CompareExchange in PublicaitonOnly without having to create a new object. In my testing with lots of threads this performed very well. Can you update your numbers to see if you get the previous performance?

@manofstick
Copy link
Author

Hiya @stephentoub !

Are you happy for me to dump all of my tests into the existing LazyTest file?

@stephentoub
Copy link
Member

Are you happy for me to dump all of my tests into the existing LazyTest file?

Extra tests would be excellent, thanks! I'd ask, though, that you de-dup with the existing ones so that we're only adding new tests for behaviors that don't already have them.

Threading bug.

TakeFactory() would crash on second call that was banked up behind the
lock.
Storing factory in Lazy object, and passing Lazy as a parameter to
ILazyItem functions. This means that None & PublicationOnly now need no
creation of secondary object.
@manofstick
Copy link
Author

@mrange sorry dude, I've modified things again, but I think I'm now at absolute minimum. I think None and PublicationOnly should show some improvements, as they no longer create any other objects other than the Lazy object itself (and as per previous comment, PublicationOnly's use of CompareExchange was restored)

@stephentoub hopefully I'll get a bit of chance to review tests tests within the next week (and hopefully no world war will begin within this time... hmmm...) Anyway, I think I'm pretty happy with the whole thing now, I've minimized object creation and all paths seem pretty optimal. I do hope serialization doesn't prove the show stopper...

@jkotas
Copy link
Member

jkotas commented Jan 21, 2017

Before this optimization, there are 3 generic types that will be created per T: Lazy<T>, Boxed<T>, LazyInternalExceptionHolder<T>. The last one does not actually have to be generic - it can be moved out to LazyHelpers.

After this throughput optimization, there are 9 generic types that will be created per T: Lazy<T>, ILazyItem, CreateInstance<T>, LazyException, None, ExecutionAndPublication, PublicationOnly, PublicationOnlyWaiter, Boxed. And they are more expensive than the earlier ones because of they tend to implement interfaces.

The generic types take time and memory to create too, and if T is valuetype there is code JITed per T too. It would interesting to look at whether the ~5x+ increase in startup cost and footprint per T pays off in typical programs. For example, if I run https://github.com/aspnet/MusicStore app, how many different Lazy types are there and how many times is each one created?

@manofstick
Copy link
Author

Hi @jkotas

I must admit I am ignorant in the ways of the JIT. but is it really creating all those types, of are they just potentially created (dare I say it, lazily created)? And if they are created eagerly is that just because they are inner classes, in which case they could be moved out and just the each other with internal scoped members?

I.e. following the happy path of creation would be, 'Lazy', 'ILazyItem', 'ExecutionAndPublication' and that's it. So still just 3. That would be most uses.

You should need 4 for 'PublicationOnly' and 3 for ’None'.

In the case of serialisation of exceptions an extra one each is required, but they are heavy operations anyway.

Anyway, let the knowledgeable people speak, I'll move back to my corner of ignorance!

@jkotas
Copy link
Member

jkotas commented Jan 22, 2017

because they are inner classes

Inner class relationship does not matter for performance. What matters is how the implementation is interconnected. For example, execution of private T CreateValue(LazyThreadSafetyMode mode) will load LazyException<T> type too because of it is referenced within the method. It is even worse problem with AOT compilation because of the AOT compilers cannot typically tell which types referenced in the implementation are the useful ones and so they have to compile all of them. You end up paying for it in the AOT binary size and compilation time.

The idea about making the value a direct field and use auxiliary state object is good, ie this is the best part of the change:

     public T Value
     {
            get
            {
                var implementation = m_implementation;
                if (implementation == null)
                    return m_value;

However the number of extra generic plumbing types makes it hard to tell whether it is obvious good trade-off for programs out there. Could you please try to refactor the implementation to avoid the generic types, and use non-generic types instead? The existing implementation is not perfect in this regard either - you should be able to beat it on the startup performance (=number of generic types) as well. Feel free to bring up any issues you may run into.

I have created a small perf test that creates a lot of different Lazy instantiations that you can use to measure: https://gist.github.com/jkotas/9c492e62c59cb5e7220daeed587d254a . On my machines, it gives 3953ms without your changes, 5734ms with your current changes. You can compile it directly against corelib for convenience:

cd \coreclr\bin\Product\Windows_NT.x64.Release
csc /noconfig /nostdlib /r:System.Private.CoreLib.dll /o+ test.cs)

Thank you for your work on this so far!

Fixing startup performance concerns
@manofstick
Copy link
Author

Howdy @jkotas,

OK, well after a little playing around I have found that the main performance issue that raised its head in regards to your concerns was not so much the creation of generics, but rather that a Func<T> was being created for the default constructor factory.

Anyway, I have made some modification to remove this, and the numbers appear a fair bit better to me. I ran some tests with 32 & 64-bit and class MyG<T, U> and struct MyG<T, U> (where I cut the depth by 1) and the new version wins some and loses some.

Let me know what you think.

@jkotas
Copy link
Member

jkotas commented Jan 22, 2017

CreateInstance<T> and Func<T>

the new version wins some and loses some

Could you please still share some data about where it wins and loses?

I still believe it would be beneficial to make all the plumbing types non-generic. I think you may need to add extra cast from Object to Func<T> to make that work, but that should cost next to nothing.

@manofstick
Copy link
Author

@jkotas

I don't think it's as simple as that. Maybe with a completely different implementation, but here I need to be able to call back into the 'Lazy' object from the ILazyItem implementation functions so they therefore need to be generic.

Anyway, I'll have a think. Not sure if I'll get much time to play this week, but I'll try to do a quick run to print timings from your testb at some stage.

}
}

/// <summary>
/// Gets a value indicating whether this instance may be used concurrently from multiple threads.
/// </summary>
internal LazyThreadSafetyMode Mode
internal LazyThreadSafetyMode? Mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: All these forwarders can be written as one-liners:

internal LazyThreadSafetyMode? Mode => LazyHelper.GetMode(_state);
internal bool IsValueFaulted => LazyHelper.GetIsValueFaulted(_state);
public bool IsValueCreated => _state == null;

@manofstick manofstick mentioned this pull request Feb 28, 2017
99 tasks
Race condition was introduced when state wasn't passed to LazyGetValue.

Also just made LazyGetValue into void CreateValue, as it was internally
recursively calling back to Value, so I think this is easier to grok
threading implications (i.e. why it wasn't just returning _value.)
@manofstick
Copy link
Author

@jkotas / @stephentoub

OK, well I think that things are good to progress.

There had been a race condition introduced when I have removed the state from being passed to LazyGetValue, but I believe we're all good again now.

I'm getting "GitHub pull request #8963 of commit 77d3a7f, has merge conflicts." so I think that's why the build is failing? Anyway, all over to you guys.

@jkotas
Copy link
Member

jkotas commented Mar 1, 2017

@manofstick Could you please merge current CoreCLR master and resolve the conflicts?

@@ -322,147 +528,16 @@ public T Value
{
get
{
Boxed boxed = null;
if (m_boxed != null )
while (true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a while loop here will make the Value property to be a bad inlining candidate - bad for steady state perf. Could you please change CreateValue to return the value instead so that it can be return (_state == null) ? _value : CreateValue();

Copy link
Member

@jkotas jkotas Mar 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or it can be - without changing CreateValue to return the value:

if (_state != null)
     CreateValue();
return _value;

This one is slightly better because it has smaller IL, and thus it will make it better inlining candidate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't just return _value without check if _state (volatile semantics). I'm not sure I can guarantee that it would always be valid (although I'm sure it would be in tests, but that's not the same thing is it 😀) I'll return it to how it was.

@@ -18,6 +18,7 @@
using System.Runtime;
using System.Runtime.InteropServices;
using System.Security;
using System.Security.Permissions;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not need to add using System.Security.Permissions back

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas

Yeah, sorry about that I'm bashing around trying to control git (I'm used to mercurial at work, and I must say, beyond rudimentary commands git drives me insane!)

Anyway, I'll get there...

@jkotas jkotas merged commit 2453adf into dotnet:master Mar 4, 2017
@jkotas
Copy link
Member

jkotas commented Mar 4, 2017

Very well done - thanks a lot!

I will take care of porting this to CoreRT repo for you.

@manofstick
Copy link
Author

Sorry @jkotas I just noticed that in my shameful use of git merging I splatted over your previous task which removed the ComVisible attribute. Can you just remove it; or do you want me to create a new PR?

@jkotas
Copy link
Member

jkotas commented Mar 4, 2017

Create a new PR for it please

@jkotas
Copy link
Member

jkotas commented Mar 4, 2017

I have just moved Lazy.cs to a different directory, so make sure to sync to latest master before creating the PR.

@manofstick
Copy link
Author

@jkotas do you never sleep/watch a movie/go to pub/etc :-) :-)

Anyway, I'll do it tomorrow. Have a good weekend!

@manofstick manofstick deleted the manofstick-lazy branch March 5, 2017 02:39
@jkotas
Copy link
Member

jkotas commented Mar 5, 2017

I went to the pub twice this week, and watched iMax movie today :-)

Thanks for submitting the PR.

@manofstick
Copy link
Author

@jkotas obviously you're just superhuman then! :-) No worries for PR; anyway, I can pack up my bags and return to the F# playground, I've left it rotting for a couple of months with an alternative linq-to-objects (F#'s Seq module) implementation. I just need to be a superhuman too to find a few weeks of time...

jorive pushed a commit to guhuro/coreclr that referenced this pull request May 4, 2017
* Remove use of lock for PublicationOnly

This involves a little bit of trickery. In avoiding the lock we need to
go from m_implementation as PublicationOnly to null in a single step. We
can't do this though without another object, so we reuse out Lazy
object, which basically just spins its wheels until the value is ready.
This shouldn't be a very common scenerio (i.e. hitting that code path)
but it is possible.

* Minimize additional object creation

Storing factory in Lazy object, and passing Lazy as a parameter to
ILazyItem functions. This means that None & PublicationOnly now need no
creation of secondary object.

* Remove Func<T> for default constructor invoking

Fixing startup performance concerns

* Moved non-generic functionality out of Lazy

...and into a helper class

* Expression-bodied functions
@karelz karelz modified the milestone: 2.0.0 Aug 28, 2017
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* System.Lazy regression tests

This is a prelude to an alternative Lazy implementation in a PR at
dotnet/coreclr#8963


Commit migrated from dotnet/corefx@029f4c4
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Remove use of lock for PublicationOnly

This involves a little bit of trickery. In avoiding the lock we need to
go from m_implementation as PublicationOnly to null in a single step. We
can't do this though without another object, so we reuse out Lazy
object, which basically just spins its wheels until the value is ready.
This shouldn't be a very common scenerio (i.e. hitting that code path)
but it is possible.

* Minimize additional object creation

Storing factory in Lazy object, and passing Lazy as a parameter to
ILazyItem functions. This means that None & PublicationOnly now need no
creation of secondary object.

* Remove Func<T> for default constructor invoking

Fixing startup performance concerns

* Moved non-generic functionality out of Lazy

...and into a helper class

* Expression-bodied functions


Commit migrated from dotnet/coreclr@2453adf
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants