Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoolConnection::drop: don't wait until inside the future to call .float() #1396

Closed
sify21 opened this issue Aug 22, 2021 · 4 comments · Fixed by #1799
Closed

PoolConnection::drop: don't wait until inside the future to call .float() #1396

sify21 opened this issue Aug 22, 2021 · 4 comments · Fixed by #1799
Labels
bug pool Related to SQLx's included connection pool

Comments

@sify21
Copy link

sify21 commented Aug 22, 2021

I'm using sqlx with actix_web and postgresql, I found a problem when trying to spawn a custom thread, I made a sample project to describe the issue.
There are 3 apis in the server. /test and /nostuck works well, /stuck is problematic. I can only query this api DB_MAXCONN(defined in .env) times, after that sqlx will get stuck at pool.begin, even /nostuck will get stuck too. /test shows that actix_web is working well.
I don't know what's wrong with /stuck, it seems that in this api the connection isn't returned to the pool. What I'm trying to do in this api is I want to do something background without blocking the response, jh.join on line 130 is only there to demonstrate the problem, I won't join the thread in actual code.

@sify21 sify21 changed the title connection pool runs out on custom threads connection pool runs out when used in custom threads Aug 22, 2021
@sify21
Copy link
Author

sify21 commented Aug 23, 2021

I experimented a little, and found a working version /nostuck2.
However, along the way I found an interesting thing: If I do tokio::time::sleep, the problem is gone. stuck2, nostuck3, nostuck4 demonstrates this. It really confuses me. Hope someone who knows the internals can explain to me.

@abonander
Copy link
Collaborator

abonander commented Aug 25, 2021

This issue is very similar to #1389 in that it's caused by the runtime going away while we're trying to spawn a task that's important to SQLx functioning correctly.

In this case, it's due to waiting to call .float() here until we're running in the async future (which is passed to sqlx_rt::spawn() in the drop handler of PoolConnection): https://github.com/launchbadge/sqlx/blob/v0.5.6/sqlx-core/src/pool/connection.rs#L104

Because you're using a single-threaded runtime, the task we spawn in PoolConnection::drop() will simply not be run because the runtime dies when the current running future returns. Because of that, we never call .float(), so the connection is effectively leaked from the pool.

If you were using a multi-threaded runtime, you'd probably see similar panics to #1389.

I need to fix this so that DecrementSizeGuard just holds the Arc<SharedPool<DB>> and not a &SharedPool<DB>, so the Floating can be created before being passed to the future which must otherwise be 'static, and then we should never have a situation where a connection is not covered by DecrementSizeGuard.

However, in the meantime, you probably don't want to be spawning a new runtime every time you want to run something in the background. You could instead use actix_web::rt::spawn() to spawn the future in the current Actix runtime, although that is still single-threaded(*) so you probably want to either limit the total number of background tasks running to keep your API fast, or else concurrently run a multi-threaded Tokio runtime, which you can do using a pattern like this:

use actix_web::web::Data;
use tokio::runtime::Handle;

pub type TokioRuntime = Data<Handle>;

#[tokio::main(flavor = "multi_threaded")]
async fn main() {
    let multithreaded_tokio = Data::new(Handle::current());

    // do async setup stuff, like opening a `PgPool` and running migrations...

    // `actix_main()` will quit on Ctrl-C
    tokio::task::block_in_place(|| actix_main(multithreaded_tokio));
}

#[actix_web::main]
async fn actix_main(multithreaded_tokio: TokioRuntime) {
   // now in the Actix runtime
    HttpServer::new(move || {
        App::new()
            .app_data(multithreaded_tokio)
            .service(handle_request)
    })
    .bind("0.0.0.0:8080")
    .unwrap()
    .run()
    .await
    .unwrap();
}

#[get("/v1/some/route")]
async fn handle_request(
    tokio: TokioRuntime,
) -> HttpResponse {
    tokio.spawn(async move { /* some long-running async operation */ });

    HttpResponse::Ok().finish()
}

(*): I keep forgetting that Actix-web is technically multithreaded in that it spawns a set of threads on start up, each with an Actix runtime, and then dispatches new connections to the threads. Spawned tasks in a given Actix runtime, however, will always execute on the same thread, and no work-stealing occurs between threads, so you could end up in a situation where one runtime is loaded down with more tasks than the others.

@abonander abonander added bug pool Related to SQLx's included connection pool labels Aug 25, 2021
@abonander abonander changed the title connection pool runs out when used in custom threads PoolConnection::drop: don't wait until inside the future to call .float() Aug 25, 2021
@sify21
Copy link
Author

sify21 commented Mar 3, 2022

Hi @abonander , there is something I can't figure out in the pattern you've proposed, I wish you could kindly help me understand.

I've read in other posts that tokio runtimes can't be nested. It will panic when starting and dropping.

In your pattern, #[tokio::main(flavor = "multi_thread")] creates a multi-thread runtime, and #[actix_web::main] creates a current-thread runtime. But I've tested your code, it works nicely, and I can send signal 15 to the running process and make it shutdown gracefully. Why is that?

@abonander
Copy link
Collaborator

It's because block_in_place() essentially makes the thread no longer a core thread for its duration. The thread-local Tokio context is cleared so to the new runtime the environment is indistinguishable from a freshly started process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug pool Related to SQLx's included connection pool
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants