-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of concept: offloading of CPU/ IO / Network to Twisted's threadpool benchmark #2137
Comments
Note that blocking calls to native calls that free the GIT WILL give performance improvements! |
I have finished the basic client and server code during the weekend. So far the results look promising. This experiment ran 6 servers and 6 clients, every client connects to one server and then issues 10 different queries so that the server's query is different and no database optimizations can make the results biased .
This means that the Asynchronous code (using Twisted's Agent) is ~17.4% faster. Note that the parsing of the response and the inserting of data after the parsing stage are all completely synchronous still, only the requests themselves were (a)synchronous. Further results should fine grain if doing these tasks asynchronously on the threadpool further increase the gain in speed. As for the server, this will also be investigated whether doing tasks on the threadpool vs. the current synchronous implementation will further optimize the speed. |
And the results are in :D Now with the server also blocking / non-blocking for IO and CPU methods. The colors have a pattern: The color of the bar is this rgb(r, g, b) Comparing the black bar with the green bar shows a 15.8% improvement. Doing the network and IO async yields a 14.4% improvement over the all synchronous scenario. Interesting observation is that offloading only cpu making things worse, so we shouldn't do that. The overhead and thread switching + method calls is more performance lost than gained. |
Good work! |
These results are without the zip inflation/deflation as an error is being thrown at random when unpacking things. I am trying to fix that, however I was thinking of maybe doing Crypto stuff on the JSON instead of zipping it, should also be CPU intensive and that's what we do now in the tunnels basically. |
I have fixed the zip inflation/deflation issue and added encryption and decryption at the server resp. client side which encrypts/decrypts the zip being sent. The results are interesting: Now, the combination of cpu, io and network being offloaded scores best, with a gain of ~10.62% So whenever networking is synchronous (blocking), it appears that offloading cpu or io will worsen the situation. Whenever it is asynchronous, all scenarios including this perform better. |
is the zip stuff freeing the GIL? |
No, and that explains probably why the CPU offloading takes more time.
|
Apparently pycrypto does release the GIL now, but I used the crpytography package. |
let's see what happens if you use pycrypto then |
Do note that the tunnels use cryptography :) I will give pycrypto a go |
Also, zlib releases the GIL apparently: https://docs.python.org/2/c-api/init.html |
Well, if we want to improve throughput and python-cryptography doesn't release the GIL, something will have to change... Let's talk about it when we have results from both libs. |
Alright, the results are in once more! Now non-blocking zipping and pycrypto which (should) releases the GIL as well. Comparing the best (blue) one to the black (everything blocking) shows a gain of ~18.93% which is huge. Nearly one fifth! Interesting that the yellow blue and white cases are very, very close, only 43 ms difference between white and blue. Also, I love zlib, very easy to use (no need for StringIO or BytesIO file like objects). |
@whirm This initial result is there, so this ticked can be closed 👍 |
Discussed with @synctext
As multiple threads in python do not give performance benefits because all of them are captured by the GIL, a proof of concept benchmark that has a mixed workload of CPU intensive tasks, IO intensive tasks (database) and networking requests that mimics Tribler's behavior should be constructed.
From this benchmark we can observe the gains if these distinct elements are placed on the thread pool vs. the current synchronous case. Results should indicate if the overhead created is acceptable.
The text was updated successfully, but these errors were encountered: