Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mobc completely stops serving connections. #63

Closed
garrensmith opened this issue Feb 20, 2022 · 6 comments
Closed

mobc completely stops serving connections. #63

garrensmith opened this issue Feb 20, 2022 · 6 comments

Comments

@garrensmith
Copy link
Collaborator

Hey,

we use mobc as part of Prisma and we getting into a situation where mobc complete stops serving any connections.
If I create a HTTP server using hyper and create a mobc pool via Quaint.

A repo with the reproduction can be found here https://github.com/garrensmith/mobc-error-example

I then use apache benchmark with a request like this:

ab -v 4 -c 200  -t 120 http://127.0.0.1:4000/

Once apache benchmark has stopped. The connections in postgres go to either to a much lower than the original number of connections I've set to open or completely to zero. If I log State from Mobc it will report it has 10 active connections. Which is incorrect.
However if I try and start apache benchmark and run it again, it will either run a lot slower and with fewer connections. Or not run at all because it cannot acquire a new connection from mobc.

I've tried a few things in the code but I cannot see why this is happening. I even tried #60 but that didn't fix it.

Any help would be really appreciated.

@garrensmith
Copy link
Collaborator Author

garrensmith commented Feb 25, 2022

hi @importcjj have you had a chance to look at this issue. Any ideas or suggestions I can look at?

@garrensmith
Copy link
Collaborator Author

garrensmith commented Mar 7, 2022

I've been diving into this a bit more and I now understand why Mobc can reach a point of dropping connections and deadlocking.

The issue is happening over here https://github.com/importcjj/mobc/blob/master/src/lib.rs#L664
First some context, in our situation, we can have a lot of concurrent requests (over 1000) for a connection from the connection pool. The connection pool will only have a small number of connections for example 10. All the waiting requests have a oneshot channel created, with the Sender added to a Queue over here https://github.com/importcjj/mobc/blob/master/src/lib.rs#L542
What can happen then is that all those waiting requests are destroyed, this happens in the case when those connection requests are coming from web requests that have been aborted. So now the conn_requests queue has over 1000 channel Senders to Receivers that have been dropped.
Now when an active connection is returned to the pool, what is supposed to happen is that mobc will go through the list of Senders and try and send the connection to a waiting Receiver. And if the Receiver has been cancelled or dropped, the connection is returned and another Sender is tried until it finds a Sender with a waiting Receiver. This is the code I mentioned earlier https://github.com/importcjj/mobc/blob/master/src/lib.rs#L664

However, when there are a large number of Receivers that have been dropped, this doesn't seem to work and the connection gets accidentally dropped. The internals.num_open is not decremented at this point. So Mobc thinks it still has active connections when in fact it does not. So it doesn't create new connections or have any connections to pass to any new connection requests.

I have an idea to solve this. But it would involve replacing the channels with a Semaphore. This would be similar to how deadpool works https://github.com/bikeshedder/deadpool/ I've tested it and it works with Mobc. But it would be quite a large change.

The reason the move to a Semaphore would be better is that when a connection is returned to the pool, there is no chance of it being dropped. It would be added to the list of free_conns. The oneshot channels will be replaced waiting for access to the semaphore. So if the request is cancelled, another request can grab the connection and there is no chance of it being accidentally lost.

@garrensmith
Copy link
Collaborator Author

To conclude this in case someone else comes across this. We are hosting a forked Mobc with fixes for this issue over here https://github.com/prisma/mobc

@w8ze-devel
Copy link

Hello,

We are experiencing exactly same issues on Prisma connection pool.

The application is a backend api developed with NestJS.

Could someone please explain how to implemented this fix with mobc please ?

Thanks.

@garrensmith
Copy link
Collaborator Author

@w8ze-devel can you open a ticket on the prisma repository to track this.

@garrensmith
Copy link
Collaborator Author

The latest 0.8.1 release fixes this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants