Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Third optimization batch — map fusion #95

Merged
merged 7 commits into from
Dec 14, 2017

Conversation

alexandru
Copy link
Member

@alexandru alexandru commented Dec 9, 2017

Third PR addressing performance issues. Previous ones are #90 and #91.

Also addresses issue #92, but without the proposed laws.


In order to avoid stack-overflow errors, repeated .map calls trigger a .flatMap every 128 calls.

The rationale is thus — from my research the default stack size ranges from 320 KB on 32-bits operating systems to 1 MB on 64-bits operating systems. From my own tests a stack frame triggered by a Function1 chaining consumes aproximately 32 bytes. So 128 fused map calls will consume 32 bytes * 128 = 4 KB at a minimum (to which you add whatever else the user is doing on that stack).

In a real flatMap chain, which is what happens with IO, the performance does end up being dominated by .flatMap calls even with .map calls being mixed-in (a benchmark proving this will follow), however the improvements for .map are still good.

The important thing to watch out for in this optimization is that a degrading .flatMap isn't acceptable to make .map faster, but if we can manage a performance boost for fused .map calls without a .flatMap regression, then it's all good.


For benchmarking I've introduced 2 new benchmarks:

  1. MapCallsBenchmark measures the performance of pure map calls (without being mixed within .flatMap loops)
  2. MapStreamBenchmark measures the performance impact of a real-world use-case, showing what to expect from Monix's Iterant and possibly FS2
Benchmark 0.6-a581664 This PR
MapCallsBenchmark.batch120 5552.051 14680.810
MapCallsBenchmark.batch30 4941.152 16196.342
MapCallsBenchmark.one 6682.469 6167.908
MapStreamBenchmark.batch120 2259.460 3176.587
MapStreamBenchmark.batch30 918.054 1389.750
MapStreamBenchmark.one 1418.865 1540.653

The other benchmarks:

Benchmark 0.6-a581664 This PR
AttemptBenchmark.errorRaised 1729.587 1750.409
AttemptBenchmark.happyPath 2373.812 2316.602
HandleErrorBenchmark.errorRaised 1974.071 2001.132
HandleErrorBenchmark.happyPath 3005.539 2944.564
DeepBindBenchmark.async 396.694 401.805
DeepBindBenchmark.delay 6887.681 6601.429
DeepBindBenchmark.pure 7849.842 7697.621
ShallowBindBenchmark.async 90.336 83.502
ShallowBindBenchmark.delay 5355.337 5324.081
ShallowBindBenchmark.pure 6059.119 6239.963

Here I seem to have suffered a slight regression — not sure if this is a fluke or not, since the differences are very small, however if this is real, I need to investigate whether I can gain some extra throughput from somewhere else (win some, lose some).

@codecov-io
Copy link

codecov-io commented Dec 9, 2017

Codecov Report

Merging #95 into master will increase coverage by 0.6%.
The diff coverage is 92.3%.

@@            Coverage Diff            @@
##           master      #95     +/-   ##
=========================================
+ Coverage    87.4%   88.01%   +0.6%     
=========================================
  Files          20       20             
  Lines         413      434     +21     
  Branches       35       35             
=========================================
+ Hits          361      382     +21     
  Misses         52       52

@alexandru alexandru changed the title WIP: Third optimization batch — map fusion Third optimization batch — map fusion Dec 9, 2017
@alexandru
Copy link
Member Author

I might need to improve test coverage.

case Map(source, g, index) =>
// Allowed to do 32 map operations in sequence before
// triggering `flatMap` in order to avoid stack overflows
if (index != 31) Map(source, g.andThen(f), index + 1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question: out of curiosity, where does the 32 number comes from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it's a number pulled out of thin air (well, I've seen it used somewhere else) — the default stack size varies depending on platform, AFAIK ranging from 64Kb (Windows 32 bits) to 1 MB (on Linux 64 bits).

And we need a comfortable value to not blow people's stacks.

But now that you've mentioned it, I decided to do some testing and be thorough about it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like it deserves a comment in the code telling the reader why this number has been chosen

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the answer

@alexandru alexandru changed the title Third optimization batch — map fusion WIP: Third optimization batch — map fusion Dec 10, 2017
@alexandru
Copy link
Member Author

I've got some refactoring ideas and some more testing to do, so placed this PR in the WIP state again.

@pakoito
Copy link

pakoito commented Dec 10, 2017

^^^ No Shame Sundays :D Feel free to review too @alexandru

@alexandru
Copy link
Member Author

I've updated the description of the PR with the methodology for picking the right maximum number of fused .map calls.

@alexandru
Copy link
Member Author

Update: added full benchmarking results and current interpretation.

Copy link
Member

@rossabaker rossabaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. A couple quick questions.

private[effect] final val fusionMaxStackDepth =
Option(System.getProperty("cats.effect.fusionMaxStackDepth", ""))
.filter(s => s != null && s.nonEmpty)
.flatMap(s => Try(s.toInt).toOption)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to depend on a logger, but is it worth it to explain on stderr why we choked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to not do it, since it introduces extra code — but I don't care much and if it's a popular demand, then OK.

What I'm thinking is that people won't modify this parameter unless they are in big trouble and we can have two issues:

  1. given my calculations, the default value seems fine, but we might underestimate stack growth in common usage
  2. we don't control all possible virtual machines, I have no idea for example what's the default stack size on Android or other non-Oracle JVMs

So increasing it won't increase performance unless used for very narrow use-cases and if people hit the stack limit because of this default, then we probably need to lower this limit in the library, with the overriding option being made available only to empower people to fix it without having to wait for another release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll buy that.

@@ -93,9 +93,9 @@ sealed abstract class IO[+A] {
this match {
case ref @ RaiseError(_) => ref
case Map(source, g, index) =>
// Allowed to do 32 map operations in sequence before
// Allowed to do 128 map operations in sequence before
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is not true in Scala.js or sometimes in the presence of the sysprop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I need to update the comment, thanks for pointing it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed that comment.

@alexandru alexandru changed the title WIP: Third optimization batch — map fusion Third optimization batch — map fusion Dec 11, 2017
* Establishes the maximum stack depth for `IO#map` operations
* for JavaScript.
*/
private[effect] final val fusionMaxStackDepth = 32
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the default supposed to be 128?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the JavaScript version. I've updated the comment in IO.scala to not mention 128. Also this particular default is now changed to 31 since we are using inequality of the current index for the actual test as a very small performance optimization.

Option(System.getProperty("cats.effect.fusionMaxStackDepth", ""))
.filter(s => s != null && s.nonEmpty)
.flatMap(s => Try(s.toInt).toOption)
.filter(_ > 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming what this looks like at 1 which is then reduced to 0 is that every operation is flatMapped rather than Map

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the intention — which made me realize that when that counter gets reset we should use a Map(this, f, 0) instead of a FlatMap(this, f.andThen(pure)).

@alexandru
Copy link
Member Author

The PR is ready for merging if you folks agree.

@alexandru
Copy link
Member Author

Any objections on merging this PR?

@mpilquist mpilquist merged commit edaaf79 into typelevel:master Dec 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants