Proof of concept: offloading of CPU/ IO / Network to Twisted's threadpool benchmark #2137

lfdversluis · 2016-04-26T11:17:13Z

Discussed with @synctext

As multiple threads in python do not give performance benefits because all of them are captured by the GIL, a proof of concept benchmark that has a mixed workload of CPU intensive tasks, IO intensive tasks (database) and networking requests that mimics Tribler's behavior should be constructed.
From this benchmark we can observe the gains if these distinct elements are placed on the thread pool vs. the current synchronous case. Results should indicate if the overhead created is acceptable.

whirm · 2016-04-26T11:38:40Z

Note that blocking calls to native calls that free the GIT WILL give performance improvements!

lfdversluis · 2016-05-01T14:29:31Z

I have finished the basic client and server code during the weekend. So far the results look promising. This experiment ran 6 servers and 6 clients, every client connects to one server and then issues 10 different queries so that the server's query is different and no database optimizations can make the results biased .

Client	Server	Duration (microseconds)
Synchronous (blocking)	Synchronous	61937940
Asynchronous (non-blocking)	Synchronous	51127318

This means that the Asynchronous code (using Twisted's Agent) is ~17.4% faster. Note that the parsing of the response and the inserting of data after the parsing stage are all completely synchronous still, only the requests themselves were (a)synchronous. Further results should fine grain if doing these tasks asynchronously on the threadpool further increase the gain in speed.

As for the server, this will also be investigated whether doing tasks on the threadpool vs. the current synchronous implementation will further optimize the speed.

lfdversluis · 2016-05-03T09:58:51Z

And the results are in :D Now with the server also blocking / non-blocking for IO and CPU methods.

The colors have a pattern:
If CPU is asynchronous (non-blocking); red (r) = 255 else 0
If Network is asynchronous (non-blocking); green (g) = 255 else 0
If IO is synchronous (non-blocking); blue (b) = 255 else 0

The color of the bar is this rgb(r, g, b)

Comparing the black bar with the green bar shows a 15.8% improvement. Doing the network and IO async yields a 14.4% improvement over the all synchronous scenario.

Interesting observation is that offloading only cpu making things worse, so we shouldn't do that. The overhead and thread switching + method calls is more performance lost than gained.

whirm · 2016-05-03T10:14:30Z

Good work!
How much data is being compressed?

lfdversluis · 2016-05-07T15:44:30Z

These results are without the zip inflation/deflation as an error is being thrown at random when unpacking things. I am trying to fix that, however I was thinking of maybe doing Crypto stuff on the JSON instead of zipping it, should also be CPU intensive and that's what we do now in the tunnels basically.

lfdversluis · 2016-05-11T10:31:10Z

I have fixed the zip inflation/deflation issue and added encryption and decryption at the server resp. client side which encrypts/decrypts the zip being sent. The results are interesting:

Now, the combination of cpu, io and network being offloaded scores best, with a gain of ~10.62%
Which is interesting, considering the fact that only offloading cpu or cpu + io is actually worse than doing everything synchronously.

So whenever networking is synchronous (blocking), it appears that offloading cpu or io will worsen the situation. Whenever it is asynchronous, all scenarios including this perform better.

whirm · 2016-05-11T13:58:30Z

is the zip stuff freeing the GIL?

lfdversluis · 2016-05-11T14:10:00Z

No, and that explains probably why the CPU offloading takes more time.
I think the crypto stuff may not be releasing the GIL either.

At this point it's safe to assume that most of the gzip+line count code requires the GIL. A quick look at "gzip.py" tells me that, yes, that is the case.
Source: http://www.dalkescientific.com/writings/diary/archive/2012/01/19/concurrent.futures.html

lfdversluis · 2016-05-11T14:12:01Z

Apparently pycrypto does release the GIL now, but I used the crpytography package.

https://github.com/dlitz/pycrypto/blob/master/ChangeLog

whirm · 2016-05-11T14:17:29Z

let's see what happens if you use pycrypto then

lfdversluis · 2016-05-11T14:18:23Z

Do note that the tunnels use cryptography :) I will give pycrypto a go

lfdversluis · 2016-05-11T14:22:03Z

Also, zlib releases the GIL apparently: https://docs.python.org/2/c-api/init.html

whirm · 2016-05-11T14:43:28Z

Do note that the tunnels use cryptography :) I will give pycrypto a go

Well, if we want to improve throughput and python-cryptography doesn't release the GIL, something will have to change... Let's talk about it when we have results from both libs.

lfdversluis · 2016-05-11T19:50:20Z

Alright, the results are in once more! Now non-blocking zipping and pycrypto which (should) releases the GIL as well.

Comparing the best (blue) one to the black (everything blocking) shows a gain of ~18.93% which is huge. Nearly one fifth! Interesting that the yellow blue and white cases are very, very close, only 43 ms difference between white and blue.

Also, I love zlib, very easy to use (no need for StringIO or BytesIO file like objects).

lfdversluis · 2016-05-16T11:27:50Z

I noticed in the last benchmark the creating of the table at the client side was included in the time measurement which shouldn't be included. Below the corrected plot:

Blue remains the best, but the all synchronous case is now the worst. The gain became even bigger between blue and black: 22.45%. Comparing white (all sync) vs black yields 17.97%.

lfdversluis · 2016-08-25T15:47:29Z

@whirm This initial result is there, so this ticked can be closed 👍

whirm added the type: MSc Thesis Work label Apr 26, 2016

whirm assigned lfdversluis Apr 26, 2016

whirm added this to the Backlog milestone Apr 26, 2016

lfdversluis mentioned this issue May 7, 2016

asynchronous database performance analysis #1883

Closed

synctext mentioned this issue May 18, 2016

slow anonymous downloads: Crypto CPU bottleneck #1882

Closed

synctext closed this as completed Aug 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof of concept: offloading of CPU/ IO / Network to Twisted's threadpool benchmark #2137

Proof of concept: offloading of CPU/ IO / Network to Twisted's threadpool benchmark #2137

lfdversluis commented Apr 26, 2016

whirm commented Apr 26, 2016

lfdversluis commented May 1, 2016 •

edited

Loading

lfdversluis commented May 3, 2016

whirm commented May 3, 2016

lfdversluis commented May 7, 2016

lfdversluis commented May 11, 2016 •

edited

Loading

whirm commented May 11, 2016

lfdversluis commented May 11, 2016 •

edited

Loading

lfdversluis commented May 11, 2016

whirm commented May 11, 2016

lfdversluis commented May 11, 2016

lfdversluis commented May 11, 2016

whirm commented May 11, 2016

lfdversluis commented May 11, 2016 •

edited

Loading

lfdversluis commented May 16, 2016 •

edited

Loading

lfdversluis commented Aug 25, 2016

Proof of concept: offloading of CPU/ IO / Network to Twisted's threadpool benchmark #2137

Proof of concept: offloading of CPU/ IO / Network to Twisted's threadpool benchmark #2137

Comments

lfdversluis commented Apr 26, 2016

whirm commented Apr 26, 2016

lfdversluis commented May 1, 2016 • edited Loading

lfdversluis commented May 3, 2016

whirm commented May 3, 2016

lfdversluis commented May 7, 2016

lfdversluis commented May 11, 2016 • edited Loading

whirm commented May 11, 2016

lfdversluis commented May 11, 2016 • edited Loading

lfdversluis commented May 11, 2016

whirm commented May 11, 2016

lfdversluis commented May 11, 2016

lfdversluis commented May 11, 2016

whirm commented May 11, 2016

lfdversluis commented May 11, 2016 • edited Loading

lfdversluis commented May 16, 2016 • edited Loading

lfdversluis commented Aug 25, 2016

lfdversluis commented May 1, 2016 •

edited

Loading

lfdversluis commented May 11, 2016 •

edited

Loading

lfdversluis commented May 11, 2016 •

edited

Loading

lfdversluis commented May 11, 2016 •

edited

Loading

lfdversluis commented May 16, 2016 •

edited

Loading