Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

various optimizations #28

Merged
merged 1 commit into from
Apr 18, 2016
Merged

various optimizations #28

merged 1 commit into from
Apr 18, 2016

Conversation

gaboraranyossy
Copy link
Contributor

Various performance optimizations following the guidelines of Norman Maurer. Check this out.

changes are:

  • using PooledByteBufAllocators
  • using native epoll on Linux instead of select
  • dispatching writes to the eventloop thread to avoid contention
  • using a single boss thread and multiple worker threads in Server

This version improves latency about 10% in my use-case. :-)

@djspiewak
Copy link
Contributor

Well someone has been paying attention to QCon. ;-) It sounds very interesting! I'll take a quick look later today.

As a sidebar, do we actually need multiple worker threads in Server since we're not doing any work on them other than enqueuing?

@gaboraranyossy
Copy link
Contributor Author

I basically converted the boss threads to workers. :-) You are probably right and we don't need more workers at all. I will experiment with this tomorrow.

In theory those queues use Strategy.sequential therefore they will continue to do the work on the thread which reads them. Those however are not the worker threads. The enqueuing Task does not need to block actually, we might as well run it async. What’s interesting is that I tested it and it gives slightly worse results.

PS: I didn’t follow GCon, just stumbled upon this presentation after some quick googling. :-) Now that you mentioned it I’m curious. Which talk did you mean?

@djspiewak
Copy link
Contributor

@gabor-aranyossy Apparently Norman gave a talk at QCon (this morning!) that covered all of these points, and it was very well received. The timing was just too coincidental to ignore. :-)

@gaboraranyossy
Copy link
Contributor Author

That's great! 👍 I hope it's gonna be available on youtube soon.

@gaboraranyossy
Copy link
Contributor Author

So I did some testing, here are the results. Please consider the tested system as a black box where there are several scalaz-netty channels between machines. I send very small messages one after another and measure the round trip time. I have done 3 measurements in all cases on a wamed-up system. As you can see more workers give slightly better results. What I still don't understand is why async enqueueing performs worse than the blocking one.

1 server worker:

min response time 2 (OK=2 KO=- )
max response time 411 (OK=411 KO=- )
mean response time 8 (OK=8 KO=- )
std deviation 6 (OK=6 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 9 (OK=8 KO=- )
mean requests/sec 3156.234 (OK=3156.234 KO=- )

min response time 2 (OK=2 KO=- )
max response time 223 (OK=223 KO=- )
mean response time 9 (OK=9 KO=- )
std deviation 7 (OK=7 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 10 (OK=10 KO=- )
mean requests/sec 3087.055 (OK=3087.055 KO=- )

min response time 2 (OK=2 KO=- )
max response time 234 (OK=234 KO=- )
mean response time 8 (OK=8 KO=- )
std deviation 7 (OK=7 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 9 (OK=9 KO=- )
mean requests/sec 3108.873 (OK=3108.873 KO=- )

4 server workers:

min response time 2 (OK=2 KO=- )
max response time 227 (OK=227 KO=- )
mean response time 8 (OK=8 KO=- )
std deviation 6 (OK=6 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 9 (OK=9 KO=- )
mean requests/sec 3217.02 (OK=3217.02 KO=- )

min response time 2 (OK=2 KO=- )
max response time 220 (OK=220 KO=- )
mean response time 8 (OK=8 KO=- )
std deviation 6 (OK=6 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 9 (OK=9 KO=- )
mean requests/sec 3252.385 (OK=3252.385 KO=- )

min response time 2 (OK=2 KO=- )
max response time 221 (OK=221 KO=- )
mean response time 8 (OK=8 KO=- )
std deviation 5 (OK=5 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 9 (OK=9 KO=- )
mean requests/sec 3196.318 (OK=3196.318 KO=- )

4 server workers -> async enqueuing:

min response time 2 (OK=2 KO=- )
max response time 304 (OK=304 KO=- )
mean response time 9 (OK=9 KO=- )
std deviation 5 (OK=5 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 10 (OK=10 KO=- )
mean requests/sec 3062.6 (OK=3062.6 KO=- )

min response time 2 (OK=2 KO=- )
max response time 228 (OK=228 KO=- )
mean response time 8 (OK=8 KO=- )
std deviation 6 (OK=6 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 10 (OK=10 KO=- )
mean requests/sec 3111.452 (OK=3111.452 KO=- )

min response time 2 (OK=2 KO=- )
max response time 330 (OK=330 KO=- )
mean response time 9 (OK=9 KO=- )
std deviation 5 (OK=5 KO=- )
response time 50th percentile 8 (OK=8 KO=- )
response time 75th percentile 10 (OK=10 KO=- )
mean requests/sec 3054.741 (OK=3054.741 KO=- )

@scottcarey
Copy link
Contributor

I plan to push one release soon -- version 0.3, for scalaz-stream 0.8 and 0.8a.

Then I'll merge this PR, and release a version 0.3.1, so that 0.3 and 0.3.1 differ only by this PR.

I'll likely do this tomorrow.

if scalaz-stream 0.8.1 and a version of specs2 built on that show up before then, then I'll update the deps to those.

@gaboraranyossy
Copy link
Contributor Author

Sounds great! Thanks.

@scottcarey scottcarey merged commit 0d17e75 into RichRelevance:master Apr 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants