Optimized observeOn/subscribeOn #2603

akarnokd · 2015-02-04T22:53:41Z

Doing observeOn/subscribeOn on these is essentially the same operation.

Benchmark results: (i7 4770k, Java 1.8u31, Windows 7 x64)

gradlew benchmarks "-Pjmh=-f 1 -tu s -bm thrpt -wi 10 -i 10 -r 5 .*ComputationSchedulerPerf.*"

Benchmark                        (size)    1.x Score  Score error   PR  Score  Score error
CompSchedulerPerf.observeOn           1   182479,416    11686,648  210536,292     9760,937  
CompSchedulerPerf.observeOn          10   174911,567    13659,846  179245,342    12725,807  
CompSchedulerPerf.observeOn         100    30997,516      958,587   27000,106      623,127  
CompSchedulerPerf.observeOn        1000     6701,672      314,471    8623,296      703,255  
CompSchedulerPerf.observeOn       10000      927,207       98,571     985,409       32,412  
CompSchedulerPerf.observeOn      100000      111,968        1,176     110,597        2,883  
CompSchedulerPerf.observeOn     1000000       11,790        0,195      11,512        0,146  
CompSchedulerPerf.subscribeOn         1   202557,014    13692,298  204616,231    11272,351  
CompSchedulerPerf.subscribeOn        10   180090,498    14058,995  190338,591     1822,349  
CompSchedulerPerf.subscribeOn       100   113625,737    32012,148  114659,399    50494,246  
CompSchedulerPerf.subscribeOn      1000    35600,359     2975,926   36856,078     1770,347  
CompSchedulerPerf.subscribeOn     10000     4220,548      311,551    3942,293      368,052  
CompSchedulerPerf.subscribeOn    100000      487,187       18,150     472,594       22,474  
CompSchedulerPerf.subscribeOn   1000000       52,191        0,349      50,250        0,510

Unfortunately, the benchmark results were quite hectic even with more warmup and iteration. I'd say the changes give +10% for the size = 1 case, but running the same code twice (observeOn 1, subscribeOn 1) gives inconsistent values. I suspect the main cause is the GC.

akarnokd · 2015-02-05T11:43:50Z

New benchmark results (i7 920, Java 1.8u31, Windows 7 x64)

Benchmark      (size)    Score 1.x            v2     v2 error   Diff %   Diff x
observeOn           1    145464,015   164994,196     3694,358    13,43     1,13
observeOn          10    132737,962   137155,992     1012,799     3,33     1,03
observeOn         100     25681,480    45319,683      514,267    76,47     1,76
observeOn        1000      3422,822    12182,024      460,586   255,91     3,55
observeOn       10000       498,887     1582,275       18,479   217,16     3,17
observeOn      100000        85,889      155,932        2,873    81,55     1,81
observeOn     1000000         8,661       15,959        1,313    84,26     1,84
subscribeOn         1    155442,402   162559,432     1046,801     4,58     1,04
subscribeOn        10    144919,758   152370,842      753,133     5,14     1,05
subscribeOn       100    107249,189   138655,111      836,759    29,28     1,29
subscribeOn      1000     27860,564    57548,020    11424,426   106,56     2,06
subscribeOn     10000      3481,585     9069,788      229,033   160,51     2,60
subscribeOn    100000       362,142      980,154       28,928   170,65     2,70
subscribeOn   1000000        35,890       92,740        1,033   158,40     2,58

There are 2x-3x througput improvements. I've verified there aren't any exceptions thrown and the last emitted value is always the size - 1 (all values should have been delivered). Note that these improvements are due to overhead reduction. With Blackhole.consumeCPU(5), the numbers look like this:

Benchmark      (size)           1.x    1.x error           v2     v2 error
observeOn           1    140296,417      441,165   160466,648    18522,034
observeOn          10    133460,788     2365,310   135523,349     1877,668
observeOn         100     25786,907     1948,696    47624,460      925,317
observeOn        1000      3559,582       81,327    11185,990      988,573
observeOn       10000       552,615       93,469     1370,051      321,379
observeOn      100000        79,281        2,550      141,401        4,252
observeOn     1000000         8,083        0,922       13,593        1,866
subscribeOn         1    155594,824     1630,359   159551,161     2583,546
subscribeOn        10    142827,468     2120,922   146452,153     1323,202
subscribeOn       100     93820,703    21969,555   102171,399    51531,245
subscribeOn      1000     23061,551     2348,392    37658,325     3900,619
subscribeOn     10000      2798,071      144,547     5128,935      331,227
subscribeOn    100000       278,345       11,796      523,751       50,789
subscribeOn   1000000        28,944        1,060       54,256        1,252

Most likely, the improvement comes from the change to isUnusubscribed reading a volatile field instead of always entering the unbiasable synchronized blocks.

daschl · 2015-02-05T16:19:25Z

@akarnokd great stuff, I'll try it out early next week and report some numbers :)

akarnokd · 2015-02-05T19:21:07Z

Now on: i7 4770k, Java 1.8u31, Windows 7 x64

Benchmark      (size)    1.x Score  Score error        Score  Score error
observeOn           1   182479,416    11686,648   206216,641     4097,849  
observeOn          10   174911,567    13659,846   183143,285     6557,560  
observeOn         100    30997,516      958,587    71018,915     2110,252  
observeOn        1000     6701,672      314,471    13325,539     1940,749  
observeOn       10000      927,207       98,571     1648,631      250,874  
observeOn      100000      111,968        1,176      200,704        7,900  
observeOn     1000000       11,790        0,195       19,285        0,630  
subscribeOn         1   202557,014    13692,298   205876,054     9574,248  
subscribeOn        10   180090,498    14058,995   194964,365     7460,966  
subscribeOn       100   113625,737    32012,148   149761,195   104683,280  
subscribeOn      1000    35600,359     2975,926    87861,668    17543,179  
subscribeOn     10000     4220,548      311,551    13208,973     2794,952  
subscribeOn    100000      487,187       18,150     1695,555      226,512  
subscribeOn   1000000       52,191        0,349      180,335       30,122

benjchristensen · 2015-02-07T00:25:18Z

I see these improvements ... still reviewing the code:

Benchmark                                   (size)   Mode   Samples          1.x        2603    
r.s.ComputationSchedulerPerf.observeOn           1  thrpt         5   104110.926  115707.286
r.s.ComputationSchedulerPerf.observeOn          10  thrpt         5   100723.402  105825.148
r.s.ComputationSchedulerPerf.observeOn         100  thrpt         5    24609.763   65571.461
r.s.ComputationSchedulerPerf.observeOn        1000  thrpt         5     3212.434   13020.027
r.s.ComputationSchedulerPerf.observeOn        2000  thrpt         5     2057.133    
r.s.ComputationSchedulerPerf.observeOn        3000  thrpt         5     1807.644    
r.s.ComputationSchedulerPerf.observeOn        4000  thrpt         5     2000.223    
r.s.ComputationSchedulerPerf.observeOn       10000  thrpt         5      955.002    1555.493  
r.s.ComputationSchedulerPerf.observeOn      100000  thrpt         5       96.628     160.218  
r.s.ComputationSchedulerPerf.observeOn     1000000  thrpt         5        9.508      16.559  
r.s.ComputationSchedulerPerf.subscribeOn         1  thrpt         5   114212.000  118885.516  
r.s.ComputationSchedulerPerf.subscribeOn        10  thrpt         5   112376.809  112270.024  
r.s.ComputationSchedulerPerf.subscribeOn       100  thrpt         5    88433.002  104240.739  
r.s.ComputationSchedulerPerf.subscribeOn      1000  thrpt         5    31503.640   64446.984  
r.s.ComputationSchedulerPerf.subscribeOn      2000  thrpt         5    16292.135    
r.s.ComputationSchedulerPerf.subscribeOn      3000  thrpt         5    11372.297    
r.s.ComputationSchedulerPerf.subscribeOn      4000  thrpt         5     9927.774    
r.s.ComputationSchedulerPerf.subscribeOn     10000  thrpt         5     3932.988    8200.048
r.s.ComputationSchedulerPerf.subscribeOn    100000  thrpt         5      437.626    1439.069
r.s.ComputationSchedulerPerf.subscribeOn   1000000  thrpt         5       43.104     146.385

akarnokd · 2015-02-10T08:14:45Z

src/main/java/rx/internal/util/ObjectPool.java


 public abstract class ObjectPool<T> {
    private Queue<T> pool;
    private final int maxSize;

-    private Scheduler.Worker schedulerWorker;
+    private final ScheduledExecutorService schedulerWorker;


We should combine all periodic maintenance-related threads into a single service: ObjectPool, IO scheduler's cleanup pool, JDK 6 computation scheduler's purge.

How does that affect memory, caches, etc to have an extra thread doing this stuff instead of just being the existing event loops?

I ask because we accidentally were running extra threads in a test with Netty+RxJava and it decreased throughput by 40%. That was far higher workload because metrics were being captured on the other threads, but it makes me less certain about choices involving putting work on other threads.

benjchristensen · 2015-02-11T21:17:40Z

src/main/java/rx/internal/operators/OperatorObserveOn.java

+            final Subscriber<? super T> child = this.child;
+            final NotificationLite<T> on = this.on;
+            @SuppressWarnings("rawtypes")
+            final AtomicLongFieldUpdater<ObserveOnSubscriber> counter = COUNTER_UPDATER;


Curious, what benefit does this give to assign this reference? Why use counter instead of COUNTER_UPDATER directly?

Same for the other assignments here.

Due to the volatile access in the loop, the JIT optimization that would hoist them into registers is not allowed (i.e., field access couldn't be moved before a volatile read) and would just re-read them all the time.

akarnokd · 2015-02-20T08:01:45Z

@benjchristensen what is the verdict on this PR?

benjchristensen · 2015-02-20T16:49:19Z

I think it's only the discussion about exposing EventLoopScheduler publicly which I don't think should happen.

akarnokd · 2015-02-20T16:58:10Z

The optimization is built upon the direct access which is not possible if the class is package private. I could create a Schedulers.scheduleSingle(Scheduler, Action0) that calls schedule direct and since they are both in the same package, no need to expose EventLoopScheduler.

benjchristensen · 2015-02-21T04:38:40Z

So why don't you just move it to the rx.internal packages as you suggested before?

benjchristensen · 2015-02-21T04:57:28Z

Can you please rebase this, and move that file to rx.internal so it is not made part of the public API?

akarnokd · 2015-02-21T09:08:26Z

Okay.

Optimized Scalar synchronous value observeOn/subscribeOn

0d55112

akarnokd added the Enhancement label Feb 4, 2015

Further optimizations.

dbc8519

akarnokd changed the title ~~Optimized Scalar synchronous value observeOn/subscribeOn~~ Optimized observeOn/subscribeOn Feb 5, 2015

akarnokd mentioned this pull request Feb 5, 2015

observeOn() slowing things down considerably #2157

Closed

akarnokd mentioned this pull request Feb 5, 2015

Operator ObserveOn without queue? #1207

Closed

akarnokd reviewed Feb 10, 2015
View reviewed changes

Restored ObjectPool to original version

0c1f0fe

benjchristensen reviewed Feb 11, 2015
View reviewed changes

Removed comments, readded parametrization.

f011b43

akarnokd mentioned this pull request Feb 20, 2015

Optimized isUnsubscribed check #2762

Merged

akarnokd closed this Feb 23, 2015

akarnokd deleted the SchedulerPerf0204 branch February 23, 2015 08:27

akarnokd mentioned this pull request Feb 23, 2015

Optimized scalar observeOn/subscribeOn #2767

Merged

akarnokd mentioned this pull request Apr 6, 2015

Performance issues after upgrading from 1.0.4 -> 1.0.8 #2857

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized observeOn/subscribeOn #2603

Optimized observeOn/subscribeOn #2603

akarnokd commented Feb 4, 2015

akarnokd commented Feb 5, 2015

daschl commented Feb 5, 2015

akarnokd commented Feb 5, 2015

benjchristensen commented Feb 7, 2015

akarnokd Feb 10, 2015

benjchristensen Feb 11, 2015

benjchristensen Feb 11, 2015

akarnokd Feb 11, 2015

akarnokd commented Feb 20, 2015

benjchristensen commented Feb 20, 2015

akarnokd commented Feb 20, 2015

benjchristensen commented Feb 21, 2015

benjchristensen commented Feb 21, 2015

akarnokd commented Feb 21, 2015

Optimized observeOn/subscribeOn #2603

Optimized observeOn/subscribeOn #2603

Conversation

akarnokd commented Feb 4, 2015

akarnokd commented Feb 5, 2015

daschl commented Feb 5, 2015

akarnokd commented Feb 5, 2015

benjchristensen commented Feb 7, 2015

akarnokd Feb 10, 2015

Choose a reason for hiding this comment

benjchristensen Feb 11, 2015

Choose a reason for hiding this comment

benjchristensen Feb 11, 2015

Choose a reason for hiding this comment

akarnokd Feb 11, 2015

Choose a reason for hiding this comment

akarnokd commented Feb 20, 2015

benjchristensen commented Feb 20, 2015

akarnokd commented Feb 20, 2015

benjchristensen commented Feb 21, 2015

benjchristensen commented Feb 21, 2015

akarnokd commented Feb 21, 2015