various optimizations #28

gaboraranyossy · 2016-03-07T14:50:47Z

Various performance optimizations following the guidelines of Norman Maurer. Check this out.

changes are:

using PooledByteBufAllocators
using native epoll on Linux instead of select
dispatching writes to the eventloop thread to avoid contention
using a single boss thread and multiple worker threads in Server

This version improves latency about 10% in my use-case. :-)

djspiewak · 2016-03-07T16:45:12Z

Well someone has been paying attention to QCon. ;-) It sounds very interesting! I'll take a quick look later today.

As a sidebar, do we actually need multiple worker threads in Server since we're not doing any work on them other than enqueuing?

gaboraranyossy · 2016-03-07T20:08:24Z

I basically converted the boss threads to workers. :-) You are probably right and we don't need more workers at all. I will experiment with this tomorrow.

In theory those queues use Strategy.sequential therefore they will continue to do the work on the thread which reads them. Those however are not the worker threads. The enqueuing Task does not need to block actually, we might as well run it async. What’s interesting is that I tested it and it gives slightly worse results.

PS: I didn’t follow GCon, just stumbled upon this presentation after some quick googling. :-) Now that you mentioned it I’m curious. Which talk did you mean?

djspiewak · 2016-03-07T20:16:43Z

@gabor-aranyossy Apparently Norman gave a talk at QCon (this morning!) that covered all of these points, and it was very well received. The timing was just too coincidental to ignore. :-)

gaboraranyossy · 2016-03-07T20:33:17Z

That's great! 👍 I hope it's gonna be available on youtube soon.

gaboraranyossy · 2016-03-08T10:05:12Z

So I did some testing, here are the results. Please consider the tested system as a black box where there are several scalaz-netty channels between machines. I send very small messages one after another and measure the round trip time. I have done 3 measurements in all cases on a wamed-up system. As you can see more workers give slightly better results. What I still don't understand is why async enqueueing performs worse than the blocking one.

1 server worker:

min response time 2 (OK=2 KO=- )
max response time 411 (OK=411 KO=- )
mean response time 8 (OK=8 KO=- )
std deviation 6 (OK=6 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 9 (OK=8 KO=- )
mean requests/sec 3156.234 (OK=3156.234 KO=- )

min response time 2 (OK=2 KO=- )
max response time 223 (OK=223 KO=- )
mean response time 9 (OK=9 KO=- )
std deviation 7 (OK=7 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 10 (OK=10 KO=- )
mean requests/sec 3087.055 (OK=3087.055 KO=- )

min response time 2 (OK=2 KO=- )
max response time 234 (OK=234 KO=- )
mean response time 8 (OK=8 KO=- )
std deviation 7 (OK=7 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 9 (OK=9 KO=- )
mean requests/sec 3108.873 (OK=3108.873 KO=- )

4 server workers:

min response time 2 (OK=2 KO=- )
max response time 227 (OK=227 KO=- )
mean response time 8 (OK=8 KO=- )
std deviation 6 (OK=6 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 9 (OK=9 KO=- )
mean requests/sec 3217.02 (OK=3217.02 KO=- )

min response time 2 (OK=2 KO=- )
max response time 220 (OK=220 KO=- )
mean response time 8 (OK=8 KO=- )
std deviation 6 (OK=6 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 9 (OK=9 KO=- )
mean requests/sec 3252.385 (OK=3252.385 KO=- )

min response time 2 (OK=2 KO=- )
max response time 221 (OK=221 KO=- )
mean response time 8 (OK=8 KO=- )
std deviation 5 (OK=5 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 9 (OK=9 KO=- )
mean requests/sec 3196.318 (OK=3196.318 KO=- )

4 server workers -> async enqueuing:

min response time 2 (OK=2 KO=- )
max response time 304 (OK=304 KO=- )
mean response time 9 (OK=9 KO=- )
std deviation 5 (OK=5 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 10 (OK=10 KO=- )
mean requests/sec 3062.6 (OK=3062.6 KO=- )

min response time 2 (OK=2 KO=- )
max response time 228 (OK=228 KO=- )
mean response time 8 (OK=8 KO=- )
std deviation 6 (OK=6 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 10 (OK=10 KO=- )
mean requests/sec 3111.452 (OK=3111.452 KO=- )

min response time 2 (OK=2 KO=- )
max response time 330 (OK=330 KO=- )
mean response time 9 (OK=9 KO=- )
std deviation 5 (OK=5 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 10 (OK=10 KO=- )
mean requests/sec 3054.741 (OK=3054.741 KO=- )

scottcarey · 2016-04-13T21:20:48Z

I plan to push one release soon -- version 0.3, for scalaz-stream 0.8 and 0.8a.

Then I'll merge this PR, and release a version 0.3.1, so that 0.3 and 0.3.1 differ only by this PR.

I'll likely do this tomorrow.

if scalaz-stream 0.8.1 and a version of specs2 built on that show up before then, then I'll update the deps to those.

gaboraranyossy · 2016-04-14T06:39:38Z

Sounds great! Thanks.

various optimizations

0d17e75

scottcarey merged commit 0d17e75 into RichRelevance:master Apr 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

various optimizations #28

various optimizations #28

gaboraranyossy commented Mar 7, 2016

djspiewak commented Mar 7, 2016

gaboraranyossy commented Mar 7, 2016

djspiewak commented Mar 7, 2016

gaboraranyossy commented Mar 7, 2016

gaboraranyossy commented Mar 8, 2016

scottcarey commented Apr 13, 2016

gaboraranyossy commented Apr 14, 2016

various optimizations #28

various optimizations #28

Conversation

gaboraranyossy commented Mar 7, 2016

djspiewak commented Mar 7, 2016

gaboraranyossy commented Mar 7, 2016

djspiewak commented Mar 7, 2016

gaboraranyossy commented Mar 7, 2016

gaboraranyossy commented Mar 8, 2016

scottcarey commented Apr 13, 2016

gaboraranyossy commented Apr 14, 2016