Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

global connection pool? #2

Open
andrenam opened this issue Dec 1, 2017 · 6 comments
Open

global connection pool? #2

andrenam opened this issue Dec 1, 2017 · 6 comments
Assignees
Labels

Comments

@andrenam
Copy link

andrenam commented Dec 1, 2017

Hi Maurits,

I just read your blog post on https://tqdev.com/2017-porting-php-crud-api-to-python
Don't you need a global to hold your connection pool to mysql?
For example, the flask framework offers flask.g to hold all globals (see http://flask.pocoo.org/docs/0.12/api/#application-globals) like database connections.

To share data that is valid for one request only from one function to another, a global variable is not good enough because it would break in threaded environments.

I don't know what the "correct" solution is here, but a simple move link = mysql.connector.connect(..) to a global variable gave me a performance boost of 3x.

@mevdschee
Copy link
Owner

mevdschee commented Dec 1, 2017

Hi @andrenam,

Thank you for contacting me and for making this suggestion. I'm a bit skeptical about the improvement this would bring, as I am already using connection pooling. The manual says:

To create a connection pool implicitly: Open a connection and specify one or more pool-related arguments (pool_name, pool_size).

I'm using implicit connection pooling, as you can see here:

https://github.com/mevdschee/py-crud-api/blob/master/api.py#L19

You are suggesting explicit connection pooling, which does not make a difference in my tests. If you would be able to do a PR then I would know exactly what you are talking about (maybe I'm doing it wrong). I would be very grateful as I would love to prove you right as a 3x improvement sounds great to me!

Kind regards,

Maurits

@mevdschee mevdschee self-assigned this Dec 1, 2017
@andrenam
Copy link
Author

andrenam commented Dec 1, 2017

Hi Maurits,

but maybe the cost of creating the connection object and the cursor with each request is too high?

I just played around and using a global variable to hold the link and cursor and a second variant using Singleton to hold the link (not cursor):

I used gunicorn with 4 threads, but a remote mysql server on a different machine on my network. so the cost of creating the connection is probably a little higher than the regular case.

original:            881.09 requests/sec
global variable:    2509.86 requests/sec
DB Singleton:       1735.19 requests/sec

See commit andrenam@b5cb20a
I added two new copies instead of overwriting the api.py to make it easier to compare them..

But I'm not sure whats the right way to do it, I don't have the expierence in python webprogramming..

@mevdschee
Copy link
Owner

mevdschee commented Dec 1, 2017

Hi @andrenam,

That is a very interesting approach. You are not doing connection pooling, but you are using a single database connection, which does does make some sense (as we are also using a single thread). This way I am also able to get a higher performance. I never tried this approach before (on any implementation), but it will most likely also help the other implementations (such as nodejs). Great discovery!! I will update my blog post to reflect this interesting finding.

Thank you very much!

Maurits

NB: I wondered about the thread safety in the gunicorn case, but it seems not to be a problem there as it uses picoev and is thus single-threaded on every worker.

@mevdschee
Copy link
Owner

mevdschee commented Dec 1, 2017

I still wonder about the importance of the things a connection pool does (or might do) to get you a "clean" connection as we are obviously skipping those as well (see also: http://blog.ulf-wendel.de/2011/using-mysql-with-php-mysqli-connections-options-pooling/).

MySQL does a mysql_change_user() call when getting a connection from the pool, which:

It always performs a ROLLBACK of any active transactions, closes and drops all temporary tables, and unlocks all locked tables. Session system variables are reset to the values of the corresponding global system variables. Prepared statements are released and HANDLER variables are closed. Locks acquired with GET_LOCK() are released. These effects occur even if the user did not change.

(source)

For most applications it would probably not be needed to do all these things, so maybe it is a good thing to just re-use the connection without using connection pooling as it is expensive (in Python).

@andrenam
Copy link
Author

andrenam commented Dec 2, 2017

Glad to give you some input. :-)

@mevdschee
Copy link
Owner

mevdschee commented Dec 2, 2017

I also wondering how (idle timeout) disconnects are handled when using a single connection. I guess this needs some further investigation :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants