Third optimization batch — map fusion #95

alexandru · 2017-12-09T10:24:49Z

Third PR addressing performance issues. Previous ones are #90 and #91.

Also addresses issue #92, but without the proposed laws.

In order to avoid stack-overflow errors, repeated .map calls trigger a .flatMap every 128 calls.

The rationale is thus — from my research the default stack size ranges from 320 KB on 32-bits operating systems to 1 MB on 64-bits operating systems. From my own tests a stack frame triggered by a Function1 chaining consumes aproximately 32 bytes. So 128 fused map calls will consume 32 bytes * 128 = 4 KB at a minimum (to which you add whatever else the user is doing on that stack).

In a real flatMap chain, which is what happens with IO, the performance does end up being dominated by .flatMap calls even with .map calls being mixed-in (a benchmark proving this will follow), however the improvements for .map are still good.

The important thing to watch out for in this optimization is that a degrading .flatMap isn't acceptable to make .map faster, but if we can manage a performance boost for fused .map calls without a .flatMap regression, then it's all good.

For benchmarking I've introduced 2 new benchmarks:

MapCallsBenchmark measures the performance of pure map calls (without being mixed within .flatMap loops)
MapStreamBenchmark measures the performance impact of a real-world use-case, showing what to expect from Monix's Iterant and possibly FS2

Benchmark	0.6-a581664	This PR
MapCallsBenchmark.batch120	5552.051	14680.810
MapCallsBenchmark.batch30	4941.152	16196.342
MapCallsBenchmark.one	6682.469	6167.908
MapStreamBenchmark.batch120	2259.460	3176.587
MapStreamBenchmark.batch30	918.054	1389.750
MapStreamBenchmark.one	1418.865	1540.653

The other benchmarks:

Benchmark	0.6-a581664	This PR
AttemptBenchmark.errorRaised	1729.587	1750.409
AttemptBenchmark.happyPath	2373.812	2316.602
HandleErrorBenchmark.errorRaised	1974.071	2001.132
HandleErrorBenchmark.happyPath	3005.539	2944.564
DeepBindBenchmark.async	396.694	401.805
DeepBindBenchmark.delay	6887.681	6601.429
DeepBindBenchmark.pure	7849.842	7697.621
ShallowBindBenchmark.async	90.336	83.502
ShallowBindBenchmark.delay	5355.337	5324.081
ShallowBindBenchmark.pure	6059.119	6239.963

Here I seem to have suffered a slight regression — not sure if this is a fluke or not, since the differences are very small, however if this is real, I need to investigate whether I can gain some extra throughput from somewhere else (win some, lose some).

codecov-io · 2017-12-09T10:30:58Z

Codecov Report

Merging #95 into master will increase coverage by 0.6%.
The diff coverage is 92.3%.

@@            Coverage Diff            @@
##           master      #95     +/-   ##
=========================================
+ Coverage    87.4%   88.01%   +0.6%     
=========================================
  Files          20       20             
  Lines         413      434     +21     
  Branches       35       35             
=========================================
+ Hits          361      382     +21     
  Misses         52       52

alexandru · 2017-12-09T12:48:23Z

I might need to improve test coverage.

daddykotex · 2017-12-10T00:25:33Z

core/shared/src/main/scala/cats/effect/IO.scala

+      case Map(source, g, index) =>
+        // Allowed to do 32 map operations in sequence before
+        // triggering `flatMap` in order to avoid stack overflows
+        if (index != 31) Map(source, g.andThen(f), index + 1)


Quick question: out of curiosity, where does the 32 number comes from?

Right now it's a number pulled out of thin air (well, I've seen it used somewhere else) — the default stack size varies depending on platform, AFAIK ranging from 64Kb (Windows 32 bits) to 1 MB (on Linux 64 bits).

And we need a comfortable value to not blow people's stacks.

But now that you've mentioned it, I decided to do some testing and be thorough about it.

sounds like it deserves a comment in the code telling the reader why this number has been chosen

Thanks for the answer

alexandru · 2017-12-10T09:06:24Z

I've got some refactoring ideas and some more testing to do, so placed this PR in the WIP state again.

pakoito · 2017-12-10T23:32:25Z

^^^ No Shame Sundays :D Feel free to review too @alexandru

alexandru · 2017-12-11T15:49:40Z

I've updated the description of the PR with the methodology for picking the right maximum number of fused .map calls.

alexandru · 2017-12-11T17:17:43Z

Update: added full benchmarking results and current interpretation.

rossabaker

I like it. A couple quick questions.

rossabaker · 2017-12-11T17:28:20Z

core/jvm/src/main/scala/cats/effect/internals/IOPlatform.scala

+  private[effect] final val fusionMaxStackDepth =
+    Option(System.getProperty("cats.effect.fusionMaxStackDepth", ""))
+      .filter(s => s != null && s.nonEmpty)
+      .flatMap(s => Try(s.toInt).toOption)


We don't want to depend on a logger, but is it worth it to explain on stderr why we choked?

I'd prefer to not do it, since it introduces extra code — but I don't care much and if it's a popular demand, then OK.

What I'm thinking is that people won't modify this parameter unless they are in big trouble and we can have two issues:

given my calculations, the default value seems fine, but we might underestimate stack growth in common usage

we don't control all possible virtual machines, I have no idea for example what's the default stack size on Android or other non-Oracle JVMs

So increasing it won't increase performance unless used for very narrow use-cases and if people hit the stack limit because of this default, then we probably need to lower this limit in the library, with the overriding option being made available only to empower people to fix it without having to wait for another release.

Okay, I'll buy that.

rossabaker · 2017-12-11T17:30:27Z

core/shared/src/main/scala/cats/effect/IO.scala

@@ -93,9 +93,9 @@ sealed abstract class IO[+A] {
    this match {
      case ref @ RaiseError(_) => ref
      case Map(source, g, index) =>
-        // Allowed to do 32 map operations in sequence before
+        // Allowed to do 128 map operations in sequence before


Comment is not true in Scala.js or sometimes in the presence of the sysprop.

Yes, I need to update the comment, thanks for pointing it out.

I changed that comment.

pakoito · 2017-12-11T22:37:03Z

core/js/src/main/scala/cats/effect/internals/IOPlatform.scala

+   * Establishes the maximum stack depth for `IO#map` operations
+   * for JavaScript.
+   */
+  private[effect] final val fusionMaxStackDepth = 32


Isn't the default supposed to be 128?

That is the JavaScript version. I've updated the comment in IO.scala to not mention 128. Also this particular default is now changed to 31 since we are using inequality of the current index for the actual test as a very small performance optimization.

ChristopherDavenport · 2017-12-11T23:08:57Z

core/jvm/src/main/scala/cats/effect/internals/IOPlatform.scala

+    Option(System.getProperty("cats.effect.fusionMaxStackDepth", ""))
+      .filter(s => s != null && s.nonEmpty)
+      .flatMap(s => Try(s.toInt).toOption)
+      .filter(_ > 0)


Just confirming what this looks like at 1 which is then reduced to 0 is that every operation is flatMapped rather than Map

Yes, that's the intention — which made me realize that when that counter gets reset we should use a Map(this, f, 0) instead of a FlatMap(this, f.andThen(pure)).

alexandru · 2017-12-12T08:01:50Z

The PR is ready for merging if you folks agree.

alexandru · 2017-12-14T08:34:45Z

Any objections on merging this PR?

Introduce map fusion

d20b54e

Test coverage

785ee2b

alexandru changed the title ~~WIP: Third optimization batch — map fusion~~ Third optimization batch — map fusion Dec 9, 2017

alexandru requested review from djspiewak, mpilquist and rossabaker December 9, 2017 11:53

daddykotex reviewed Dec 10, 2017

View reviewed changes

alexandru changed the title ~~Third optimization batch — map fusion~~ WIP: Third optimization batch — map fusion Dec 10, 2017

pakoito mentioned this pull request Dec 10, 2017

Refactor IO following Cats arrow-kt/arrow#505

Merged

Increase fusion to 128 max and make it configurable

174ec54

mpilquist approved these changes Dec 11, 2017

View reviewed changes

rossabaker reviewed Dec 11, 2017

View reviewed changes

alexandru added 2 commits December 11, 2017 22:27

Modify comments, remove RaiseError optimization in .map

c23f92c

Fix JS compilation error

540c879

alexandru changed the title ~~WIP: Third optimization batch — map fusion~~ Third optimization batch — map fusion Dec 11, 2017

pakoito reviewed Dec 11, 2017

View reviewed changes

ChristopherDavenport reviewed Dec 11, 2017

View reviewed changes

alexandru added 2 commits December 12, 2017 07:01

On resetting the counter still use a Map state (not a FlatMap)

7bde17f

Improve tests

b553dad

alexandru mentioned this pull request Dec 12, 2017

Fix #92: Sync.map should suspend evaluation #96

Merged

rossabaker approved these changes Dec 12, 2017

View reviewed changes

mpilquist merged commit edaaf79 into typelevel:master Dec 14, 2017

alexandru mentioned this pull request Dec 26, 2017

Task / Coeval Run-loop Optimizations, First Batch monix/monix#474

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Third optimization batch — map fusion #95

Third optimization batch — map fusion #95

alexandru commented Dec 9, 2017 •

edited

Loading

codecov-io commented Dec 9, 2017 •

edited

Loading

alexandru commented Dec 9, 2017

daddykotex Dec 10, 2017

alexandru Dec 10, 2017

ritschwumm Dec 10, 2017

daddykotex Dec 11, 2017

alexandru commented Dec 10, 2017

pakoito commented Dec 10, 2017 •

edited

Loading

alexandru commented Dec 11, 2017

alexandru commented Dec 11, 2017

rossabaker left a comment

rossabaker Dec 11, 2017

alexandru Dec 11, 2017

rossabaker Dec 12, 2017

rossabaker Dec 11, 2017

alexandru Dec 11, 2017

alexandru Dec 11, 2017

pakoito Dec 11, 2017

alexandru Dec 12, 2017

ChristopherDavenport Dec 11, 2017

alexandru Dec 12, 2017

alexandru commented Dec 12, 2017

alexandru commented Dec 14, 2017

Third optimization batch — map fusion #95

Third optimization batch — map fusion #95

Conversation

alexandru commented Dec 9, 2017 • edited Loading

codecov-io commented Dec 9, 2017 • edited Loading

Codecov Report

alexandru commented Dec 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexandru commented Dec 10, 2017

pakoito commented Dec 10, 2017 • edited Loading

alexandru commented Dec 11, 2017

alexandru commented Dec 11, 2017

rossabaker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexandru commented Dec 12, 2017

alexandru commented Dec 14, 2017

alexandru commented Dec 9, 2017 •

edited

Loading

codecov-io commented Dec 9, 2017 •

edited

Loading

pakoito commented Dec 10, 2017 •

edited

Loading