Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection pool using client::get(uri).with_connector(...) is not respecting keep-alive #212

Closed
parasyte opened this issue May 8, 2018 · 16 comments

Comments

@parasyte
Copy link

parasyte commented May 8, 2018

I hacked together a quick proxy which is very similar to the proxy example, except I'm using ClientRequestBuilder.with_connector() trying to utilize the built-in connection pooling.

#![feature(proc_macro, generators)]

extern crate actix;
extern crate actix_web;
extern crate futures_await as futures;

use actix::{Actor, Addr, Unsync};
use actix_web::{client, server, App, Path, State};
use actix_web::http::Method;
use futures::prelude::*;

struct AppState {
    conn: Addr<Unsync, client::ClientConnector>,
}

#[async(boxed)]
fn index(info: Path<(u32, String)>, state: State<AppState>) -> Result<String, actix_web::error::Error> {
    let uri = format!("http://127.0.0.1:8081/{}/{}/index.html", info.0, info.1);

    let resp = await!(client::get(uri)
        .with_connector(state.conn.clone())
        .no_default_headers()
        .upgrade()
        .finish()?
        .send()
    )?;

    let body = await!(resp.into_future())
        .unwrap()
        .0
        .unwrap();

    Ok(std::str::from_utf8(&body)?.to_owned())
}

fn main() {
    let addr = "127.0.0.1:8080";
    println!("Listening on http://{}", addr);

    server::new(|| {
        let conn: Addr<Unsync, _> = client::ClientConnector::default().start();
        App::with_state(AppState{conn})
            .resource("/{id}/{name}/index.html", |r| r.method(Method::GET).with2(index))
    }).bind(addr)
        .unwrap()
        .run();
}

The proxy functionality works (using the basic hello-world example as an upstream on port 8081), but keep-alive is not working, as you can see from tcpdump:

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo0, link-type NULL (BSD loopback), capture size 262144 bytes
18:05:42.542170 IP 127.0.0.1.58422 > 127.0.0.1.8080: Flags [S], seq 1194112351, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 1218386738 ecr 0,sackOK,eol], length 0
18:05:42.542279 IP 127.0.0.1.8080 > 127.0.0.1.58422: Flags [S.], seq 3083285766, ack 1194112352, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 1218386738 ecr 1218386738,sackOK,eol], length 0
18:05:42.542301 IP 127.0.0.1.58422 > 127.0.0.1.8080: Flags [.], ack 1, win 12759, options [nop,nop,TS val 1218386738 ecr 1218386738], length 0
18:05:42.542322 IP 127.0.0.1.8080 > 127.0.0.1.58422: Flags [.], ack 1, win 12759, options [nop,nop,TS val 1218386738 ecr 1218386738], length 0
18:05:42.542350 IP 127.0.0.1.58422 > 127.0.0.1.8080: Flags [P.], seq 1:154, ack 1, win 12759, options [nop,nop,TS val 1218386738 ecr 1218386738], length 153: HTTP: GET /123/jay/index.html HTTP/1.1
18:05:42.542367 IP 127.0.0.1.8080 > 127.0.0.1.58422: Flags [.], ack 154, win 12754, options [nop,nop,TS val 1218386738 ecr 1218386738], length 0
18:05:42.542567 IP 127.0.0.1.58423 > 127.0.0.1.8081: Flags [S], seq 1266615293, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 1218386738 ecr 0,sackOK,eol], length 0
18:05:42.542637 IP 127.0.0.1.8081 > 127.0.0.1.58423: Flags [S.], seq 2134870156, ack 1266615294, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 1218386738 ecr 1218386738,sackOK,eol], length 0
18:05:42.542647 IP 127.0.0.1.58423 > 127.0.0.1.8081: Flags [.], ack 1, win 12759, options [nop,nop,TS val 1218386738 ecr 1218386738], length 0
18:05:42.542658 IP 127.0.0.1.8081 > 127.0.0.1.58423: Flags [.], ack 1, win 12759, options [nop,nop,TS val 1218386738 ecr 1218386738], length 0
18:05:42.542708 IP 127.0.0.1.58423 > 127.0.0.1.8081: Flags [P.], seq 1:91, ack 1, win 12759, options [nop,nop,TS val 1218386738 ecr 1218386738], length 90
18:05:42.542726 IP 127.0.0.1.8081 > 127.0.0.1.58423: Flags [.], ack 91, win 12756, options [nop,nop,TS val 1218386738 ecr 1218386738], length 0
18:05:42.542810 IP 127.0.0.1.8081 > 127.0.0.1.58423: Flags [P.], seq 1:135, ack 91, win 12756, options [nop,nop,TS val 1218386738 ecr 1218386738], length 134
18:05:42.542827 IP 127.0.0.1.58423 > 127.0.0.1.8081: Flags [.], ack 135, win 12755, options [nop,nop,TS val 1218386738 ecr 1218386738], length 0
18:05:42.542908 IP 127.0.0.1.8080 > 127.0.0.1.58422: Flags [P.], seq 1:135, ack 154, win 12754, options [nop,nop,TS val 1218386738 ecr 1218386738], length 134: HTTP: HTTP/1.1 200 OK
18:05:42.542935 IP 127.0.0.1.58422 > 127.0.0.1.8080: Flags [.], ack 135, win 12755, options [nop,nop,TS val 1218386738 ecr 1218386738], length 0
18:05:42.546016 IP 127.0.0.1.58422 > 127.0.0.1.8080: Flags [F.], seq 154, ack 135, win 12755, options [nop,nop,TS val 1218386741 ecr 1218386738], length 0
18:05:42.546051 IP 127.0.0.1.8080 > 127.0.0.1.58422: Flags [.], ack 155, win 12754, options [nop,nop,TS val 1218386741 ecr 1218386741], length 0
18:05:42.546107 IP 127.0.0.1.8080 > 127.0.0.1.58422: Flags [F.], seq 135, ack 155, win 12754, options [nop,nop,TS val 1218386741 ecr 1218386741], length 0
18:05:42.546137 IP 127.0.0.1.58422 > 127.0.0.1.8080: Flags [.], ack 136, win 12755, options [nop,nop,TS val 1218386741 ecr 1218386741], length 0
18:05:43.200008 IP 127.0.0.1.58423 > 127.0.0.1.8081: Flags [F.], seq 91, ack 135, win 12755, options [nop,nop,TS val 1218387384 ecr 1218386738], length 0
18:05:43.200039 IP 127.0.0.1.8081 > 127.0.0.1.58423: Flags [.], ack 92, win 12756, options [nop,nop,TS val 1218387384 ecr 1218387384], length 0
18:05:43.200085 IP 127.0.0.1.8081 > 127.0.0.1.58423: Flags [F.], seq 135, ack 92, win 12756, options [nop,nop,TS val 1218387385 ecr 1218387384], length 0
18:05:43.200123 IP 127.0.0.1.58423 > 127.0.0.1.8081: Flags [.], ack 136, win 12755, options [nop,nop,TS val 1218387385 ecr 1218387385], length 0

I would expect the FIN packets to not be sent to port 8081 (timestamps 18:05:43.200008 and above) if keep-alive was working, and the sockets should be reused indefinitely. The FIN is also delayed from the last ACK packet at 18:05:42.542827 ... by about 650ms.

@fafhrd91
Copy link
Member

fafhrd91 commented May 8, 2018

what version of actix-web do you use?

@fafhrd91
Copy link
Member

fafhrd91 commented May 8, 2018

just a note, each worker thread uses it's own connector. your code starts multiple threads, depends on number of cpu you have

@fafhrd91
Copy link
Member

fafhrd91 commented May 8, 2018

checked master and 0.5.6, both works.

i noticed .upgrade() in your code

   let resp = await!(client::get(uri)
        .with_connector(state.conn.clone())
        .no_default_headers()
        .upgrade()
        .finish()?
        .send()
    )?;

actix-web always closes upgraded connections

@parasyte
Copy link
Author

parasyte commented May 8, 2018

Actix 0.5.6. Here's my Cargo.toml:

[package]
name = "actixello-world"
version = "0.1.0"
authors = ["Jay Oster <[email protected]>"]

[dependencies]
actix = "0.5.6"
actix-web = "0.5.6"
futures-await = "0.1.1"

I agree, it works functionally. The bug is there's no keep-alive. Even if I comment .upgrade(), the connection to upstream port 8081 still closes in about 650ms.

@parasyte
Copy link
Author

parasyte commented May 8, 2018

The delay between last ACK and FIN from client is not always 650ms, it just happened to be about 650ms when I grabbed that tcpdump sample. After sending the request multiple times, the behavior appears to be more periodic in nature; client sockets get closed by some kind of check that occurs every second.

@fafhrd91
Copy link
Member

fafhrd91 commented May 8, 2018

i made some changes

extern crate actix;
extern crate actix_web;

use actix::{Actor, Addr, Context, Handler, Syn, Unsync};
use actix_web::client::ClientConnectorStats;
use actix_web::http::Method;
use actix_web::{client, server, App, AsyncResponder, FutureResponse, HttpMessage, Path, State};
use futures::prelude::*;

struct AppState {
    conn: Addr<Unsync, client::ClientConnector>,
}

fn index(info: Path<(u32, String)>, state: State<AppState>) -> FutureResponse<String> {
    let uri = format!("http://127.0.0.1:8081/{}/{}/index.html", info.0, info.1);

    client::get(uri)
        .with_connector(state.conn.clone())
        .finish()
        .unwrap()
        .send()
        .from_err()
        .and_then(|resp| {
            resp.body()
                .from_err()
                .and_then(|body| Ok(std::str::from_utf8(&body)?.to_owned()))
        })
        .responder()
}

pub struct Stats;

impl Actor for Stats {
    type Context = Context<Self>;
}

impl Handler<ClientConnectorStats> for Stats {
    type Result = ();

    fn handle(&mut self, msg: ClientConnectorStats, _: &mut Self::Context) {
        println!("REUSED: {}", msg.reused);
        println!("CLOSED: {}", msg.closed);
    }
}

fn main() {
    let addr = "127.0.0.1:8080";
    println!("Listening on http://{}", addr);

    server::new(|| {
        let stats: Addr<Syn, _> = Stats.start();
        let conn: Addr<Unsync, _> = client::ClientConnector::default()
            .stats(stats.recipient())
            .start();
        App::with_state(AppState { conn: conn }).resource("/{id}/{name}/index.html", |r| {
            r.method(Method::GET).with2(index)
        })
    }).bind(addr)
        .unwrap()
        .threads(1)
        .run();
}

stats doesnt show any closed connections

@fafhrd91
Copy link
Member

fafhrd91 commented May 8, 2018

what is your os level tcp keep-alive settings?

by default actix-web keep-alive does not enforce keep-alive, it relies on os level tcp settings

@parasyte
Copy link
Author

parasyte commented May 8, 2018

That's a good question. I'm on macOS, so it's just Darwin defaults. Which are basically the same as Linux defaults:

$ sysctl -a | grep keep
net.inet.tcp.keepidle: 7200000
net.inet.tcp.keepintvl: 75000
net.inet.tcp.keepinit: 75000
net.inet.tcp.keepcnt: 8
net.inet.tcp.always_keepalive: 0
net.inet.mptcp.keepalive: 840
net.link.ether.inet.keep_announcements: 1
net.key.natt_keepalive_interval: 20
net.inet6.ip6.keepfaith: 0
net.necp.pass_keepalives: 1

@fafhrd91
Copy link
Member

fafhrd91 commented May 8, 2018

could you try code i posted

@parasyte
Copy link
Author

parasyte commented May 8, 2018

I get a lot of errors when compiling the code you provided verbatim (missing extern crate futures) But it does appear to work better. The socket closes after about 10 seconds idle, which is an improvement, but it should remain open for many hours until the keep-alive timeout expires and all keep-alive probes fail.

keepidle is set to 7200000ms, which is 2 hours, keepintvl is set to 75000ms, which is 1 minute and 15 seconds, and keepcnt is set to 8, which means altogether each keep-alive socket should remain open for 2 hours and 10 minutes at the minimum. Something is definitely not right.

@fafhrd91
Copy link
Member

fafhrd91 commented May 8, 2018

client closes connection after keep-alive period.

https://actix.rs/actix-web/actix_web/client/struct.ClientConnector.html#method.conn_keep_alive

@parasyte
Copy link
Author

parasyte commented May 8, 2018

I cannot seem to override that value...

let conn: Addr<Unsync, _> = client::ClientConnector::default()
    .conn_keep_alive(std::time::Duration::new(60, 0))
    .stats(stats.recipient())
    .start();

Still closes the connection after a few seconds.

I'm also very confused about why using async!/await! causes the client to close the connection within 1 second. It always closes the connection on the next stats iteration.

@parasyte
Copy link
Author

parasyte commented May 8, 2018

I found the cause of async!/await! closing connections, thanks to your modifications. Using the HttpMessage trait to retrieve the client response body (instead of the into_future() method) gets the async!/await! code into feature parity.

It still closes the connection earlier than expected, regardless of the conn_keep_alive() duration. I can see it in both the stats output and in tcpdump:

Duration { secs: 4, nanos: 10808000 } opened: 1, closed: 0
Duration { secs: 5, nanos: 16835000 } opened: 0, closed: 0
Duration { secs: 6, nanos: 18488000 } opened: 0, closed: 0
Duration { secs: 7, nanos: 20592000 } opened: 0, closed: 0
Duration { secs: 8, nanos: 25764000 } opened: 0, closed: 0
Duration { secs: 9, nanos: 27693000 } opened: 0, closed: 0
Duration { secs: 10, nanos: 32110000 } opened: 0, closed: 0
Duration { secs: 11, nanos: 37601000 } opened: 0, closed: 0
Duration { secs: 12, nanos: 38705000 } opened: 0, closed: 0
Duration { secs: 13, nanos: 42744000 } opened: 0, closed: 0
Duration { secs: 14, nanos: 48425000 } opened: 0, closed: 0
Duration { secs: 15, nanos: 53332000 } opened: 0, closed: 0
Duration { secs: 16, nanos: 58103000 } opened: 0, closed: 0
Duration { secs: 17, nanos: 62868000 } opened: 0, closed: 0
Duration { secs: 18, nanos: 63700000 } opened: 0, closed: 0
Duration { secs: 19, nanos: 69434000 } opened: 0, closed: 1
11:55:07.485974 IP 127.0.0.1.59787 > 127.0.0.1.8080: Flags [S], seq 1904559169, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 1225999447 ecr 0,sackOK,eol], length 0
11:55:07.486046 IP 127.0.0.1.8080 > 127.0.0.1.59787: Flags [S.], seq 1494672507, ack 1904559170, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 1225999447 ecr 1225999447,sackOK,eol], length 0
11:55:07.486058 IP 127.0.0.1.59787 > 127.0.0.1.8080: Flags [.], ack 1, win 12759, options [nop,nop,TS val 1225999447 ecr 1225999447], length 0
11:55:07.486069 IP 127.0.0.1.8080 > 127.0.0.1.59787: Flags [.], ack 1, win 12759, options [nop,nop,TS val 1225999447 ecr 1225999447], length 0
11:55:07.486116 IP 127.0.0.1.59787 > 127.0.0.1.8080: Flags [P.], seq 1:154, ack 1, win 12759, options [nop,nop,TS val 1225999447 ecr 1225999447], length 153: HTTP: GET /123/jay/index.html HTTP/1.1
11:55:07.486134 IP 127.0.0.1.8080 > 127.0.0.1.59787: Flags [.], ack 154, win 12754, options [nop,nop,TS val 1225999447 ecr 1225999447], length 0
11:55:07.493133 IP 127.0.0.1.59790 > 127.0.0.1.8081: Flags [S], seq 2268097842, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 1225999453 ecr 0,sackOK,eol], length 0
11:55:07.493206 IP 127.0.0.1.8081 > 127.0.0.1.59790: Flags [S.], seq 156827021, ack 2268097843, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 1225999453 ecr 1225999453,sackOK,eol], length 0
11:55:07.493219 IP 127.0.0.1.59790 > 127.0.0.1.8081: Flags [.], ack 1, win 12759, options [nop,nop,TS val 1225999453 ecr 1225999453], length 0
11:55:07.493231 IP 127.0.0.1.8081 > 127.0.0.1.59790: Flags [.], ack 1, win 12759, options [nop,nop,TS val 1225999453 ecr 1225999453], length 0
11:55:07.493825 IP 127.0.0.1.59790 > 127.0.0.1.8081: Flags [P.], seq 1:123, ack 1, win 12759, options [nop,nop,TS val 1225999454 ecr 1225999453], length 122
11:55:07.493850 IP 127.0.0.1.8081 > 127.0.0.1.59790: Flags [.], ack 123, win 12755, options [nop,nop,TS val 1225999454 ecr 1225999454], length 0
11:55:07.493962 IP 127.0.0.1.8081 > 127.0.0.1.59790: Flags [P.], seq 1:135, ack 123, win 12755, options [nop,nop,TS val 1225999454 ecr 1225999454], length 134
11:55:07.493989 IP 127.0.0.1.59790 > 127.0.0.1.8081: Flags [.], ack 135, win 12755, options [nop,nop,TS val 1225999454 ecr 1225999454], length 0
11:55:07.494178 IP 127.0.0.1.8080 > 127.0.0.1.59787: Flags [P.], seq 1:135, ack 154, win 12754, options [nop,nop,TS val 1225999454 ecr 1225999447], length 134: HTTP: HTTP/1.1 200 OK
11:55:07.494198 IP 127.0.0.1.59787 > 127.0.0.1.8080: Flags [.], ack 135, win 12755, options [nop,nop,TS val 1225999454 ecr 1225999454], length 0
11:55:07.497286 IP 127.0.0.1.59787 > 127.0.0.1.8080: Flags [F.], seq 154, ack 135, win 12755, options [nop,nop,TS val 1225999457 ecr 1225999454], length 0
11:55:07.497315 IP 127.0.0.1.8080 > 127.0.0.1.59787: Flags [.], ack 155, win 12754, options [nop,nop,TS val 1225999457 ecr 1225999457], length 0
11:55:07.497371 IP 127.0.0.1.8080 > 127.0.0.1.59787: Flags [F.], seq 135, ack 155, win 12754, options [nop,nop,TS val 1225999457 ecr 1225999457], length 0
11:55:07.497397 IP 127.0.0.1.59787 > 127.0.0.1.8080: Flags [.], ack 136, win 12755, options [nop,nop,TS val 1225999457 ecr 1225999457], length 0
11:55:22.744314 IP 127.0.0.1.59790 > 127.0.0.1.8081: Flags [F.], seq 123, ack 135, win 12755, options [nop,nop,TS val 1226014637 ecr 1225999454], length 0
11:55:22.744424 IP 127.0.0.1.8081 > 127.0.0.1.59790: Flags [.], ack 124, win 12755, options [nop,nop,TS val 1226014637 ecr 1226014637], length 0
11:55:22.744530 IP 127.0.0.1.8081 > 127.0.0.1.59790: Flags [F.], seq 135, ack 124, win 12755, options [nop,nop,TS val 1226014637 ecr 1226014637], length 0
11:55:22.744587 IP 127.0.0.1.59790 > 127.0.0.1.8081: Flags [.], ack 136, win 12755, options [nop,nop,TS val 1226014637 ecr 1226014637], length 0

@parasyte
Copy link
Author

parasyte commented May 8, 2018

@fafhrd91 After some digging, I found the problem. The documentation doesn't match the code:

Default keep-alive period is 15 seconds.

Default lifetime period is 75 seconds.

conn_lifetime: Duration::from_secs(15),
conn_keep_alive: Duration::from_secs(75),

The values are swapped! Increasing the lifetime to 75 seconds indeed keeps the socket open for 75 seconds.

Setting conn_lifetime to 120 seconds causes the idle connection to close after 75 seconds (default keep-alive), and 120 seconds after its initial creation when the socket is reused.

There is a bug after all, just not what I was experiencing. The values for conn_lifetime and conn_keep_alive need to be swapped in the default implementation for ClientConnector. The same issue appears in the TLS implementation; only the ALPN implementation gets it correct.

@fafhrd91
Copy link
Member

fafhrd91 commented May 8, 2018

oh! thanks for debugging!

@fafhrd91
Copy link
Member

fafhrd91 commented May 8, 2018

fixed in master, i will release new version later this week.

@fafhrd91 fafhrd91 closed this as completed May 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants