Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: ReadTimeout Breaks Server CloseNotify Functionality For Keep-Alive Connections #16958

Closed
prabrisat1 opened this issue Sep 1, 2016 · 15 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@prabrisat1
Copy link

Issue

GO1.7's functionality for HTTP 1.1's keep-alive, net/http.Server.ReadTimeout, and net/http.CloseNotifier is not consistent with the documented and/or expected behavior, and cannot be used together properly. The timer created by setting the server's readTimeout value is reset only at the end of processing each request and is not paused/reset when we have finished reading in each request, causing CloseNotify() to signal without cause.

What did I do

This test creates a simple http server and then sends 2 requests, each of which takes 40ms of processing time with a 70ms gap between them, responding 200OK if the 40ms of work completes, and 503 if it receives a closeNotify signal before finishing that work.

Expected Result

We should receive 2 200 OK messages and not receive a signal from CloseNotify().

We have a readTimeout of 100ms, and a 70ms delay between receiving a response for the first request and sending the second request, so both requests should be read within the timeout and be able to complete their work.

Actual Result

We actually see that during the second request, we receive a closeNotify() signal approximately 30ms after the client has sent the second request. At this time we have clearly read the second request and started processing it with a still alive TCP connection, but still receive a closeNotify() signal, which should not be happening. There is also an attached wireshark log showing this.

package main

import (
    "io"
    "io/ioutil"
    "net/http"
    "net/http/httptest"
    "sync"
    "testing"
    "time"
)

func TestServerKeepAliveReadTimeoutCloseNotify(t *testing.T) {
    var startMu sync.Mutex
    var start time.Time

    testServer := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        cn := w.(http.CloseNotifier)
        select {
        case <-time.After(40 * time.Millisecond):
            w.WriteHeader(http.StatusOK)
        case <-cn.CloseNotify():
            startMu.Lock()
            delta := time.Since(start)
            startMu.Unlock()
            t.Errorf("Unexpected CloseNotify received after %s", delta)
            w.WriteHeader(http.StatusServiceUnavailable)
        }
    }))
    testServer.Config.ReadTimeout = 100 * time.Millisecond
    testServer.Start()
    defer testServer.Close()

    for i := 0; i < 2; i++ {
        startMu.Lock()
        start = time.Now()
        startMu.Unlock()

        resp, err := http.Get(testServer.URL)
        if err != nil {
            t.Fatal(err)
        }
        _, err = io.Copy(ioutil.Discard, resp.Body)
        if err != nil {
            t.Fatal(err)
        }
        err = resp.Body.Close()
        if err != nil {
            t.Fatal(err)
        }

        time.Sleep(70 * time.Millisecond)
    }
}

Test output

$ go test notifierIssue_test.go 
--- FAIL: TestServerKeepAliveReadTimeoutCloseNotify (0.22s)
    notifierIssue_test.go:26: Unexpected CloseNotify received after 30.752781ms
FAIL
FAIL    command-line-arguments  0.225s

wSharkLogs.pcapng.zip

Potential Fix

Currently the read timeout is only reset here: https://github.com/golang/go/blob/go1.7/src/net/http/server.go#L749 after a request has finished being processed and we start waiting on the next one. However, the timeout is never paused/reset for the period between the end of reading the request and the end of sending the response, as it should be.

What version of Go are you using (go version)?

go version go1.7 darwin/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/4r/1g2c4css78b1yn2q32dmqqm0l05w9h/T/go-build102138663=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"

@bradfitz
Copy link
Contributor

bradfitz commented Sep 1, 2016

Your bug report starts off saying "GO1.7's functionality ...". Is this a regression from Go 1.6? I don't recall anything changing here.

@bradfitz bradfitz changed the title ReadTimeout Breaks Server CloseNotify Functionality For Keep-Alive Connections net/http: ReadTimeout Breaks Server CloseNotify Functionality For Keep-Alive Connections Sep 1, 2016
@freeformz
Copy link

I'm pretty sure this is documented under https://golang.org/pkg/net/http/#CloseNotifier, or am I misunderstanding the docs?

@freeformz
Copy link

For instance if you instead create a NewRequest and set req.Close=true (which forces another TCP connection), then your test works, which is what is described in the documentation.

@prabrisat1
Copy link
Author

As far as I know this should act the same way in GO 1.6 and 1.7. The oldest version I saw the same issue with was 1.5.

As for the documentation for closeNotifier, i'm not sure what you're referring to.

From my understanding of the documentation if you set req.Close=true either on the client side or the server side (for response) you will kill the tcp connection after a single request-response pair, which is the same as not using keep-alive connections at all since you will need to reestablish tcp for the next request. This issue is specifically a problem when using keep-alive connections.

@rhysh
Copy link
Contributor

rhysh commented Sep 1, 2016

What I find confusing here is that the ReadTimeout timer starts when the previous handler returns, and that it's not reset when the first byte of the following request arrives.

It's not clear to me how to correctly use net/http.Server.ReadTimeout without entirely disabling HTTP/1.1 keepalives. The timeout it controls is neither how long it takes for a subsequent request to arrive, nor how long it takes to read that request and body once it starts arriving.

@bradfitz
Copy link
Contributor

bradfitz commented Sep 1, 2016

Yeah, ReadTimeout is extremely ill-specified (and implemented).

Related bugs:

Those two should be fixed together.

I haven't re-read this bug enough to understand whether there's a unique problem here, or just a different view on one of the existing ones.

@freeformz
Copy link

Looks like I was confusing pipelining with connection keepalives. :-( Sorry for the noise.

@freeformz
Copy link

After peppering the code with debug info I see the first request happening and then as soon as the first request finishes server.go (in readRequest) resets the deadline to 100 ms in the future.

This makes sense because we're again reading from the socket for the next http request and both the http client and the server are holding the connection open.

AFAICT this has nothing to do with sending the response since we reset the timeout after sending the response. The client is just being slow from the perspective of the the server.

I think this is effectively working as intended.

@prabrisat1
Copy link
Author

prabrisat1 commented Sep 3, 2016

You're correct that as soon as the server finishes the first request it resets the deadline. That is the problem, the readtimeout is reset after first request is completed, not after the message is read specifically.

Its much clearer if you take a look at the attached wireshark logs. The read timeout is not being reset after the request is read, but after the request has been read and the handler's work has been completed. In the above example, the client receives a 200 OK ~41ms after process starts, and the readTimeout is reset here @ 41ms. The server receives the next request at ~116ms, so ~75ms later. The message is just a GET with no body and takes ~1-2ms to be read, a total of ~77ms after the timer has been reset, so we should have read a message within the readTimeout right? This isn't true because the readTimeout is limiting the readTime and the time required for the handler to complete, which is why we receive a closeNotify signal at ~142ms before our 40ms of work on the request completes at ~117ms.

Effectively, the current readTimeout measures (time spent waiting for a request + time spend reading request + time spent doing work on the request) when it should really only represent either (time waiting on request + time reading request) or even just (time spent reading request) for the name readTimeout to actually be applicable.

@freeformz
Copy link

@prabrisat1
But, after the first request completes is the start of the next request when the TCP connection is reused as part of HTTP 1.1 keepalives. That is when you have to start the read timeout, otherwise a client can just hold open the connection and starve you of resources. In effect ReadTimeout becomes RequestTimeout though because of it.

Here is the sequence of events as I've seen them:
Client Issues Get -> Server SetsDeadline to 100ms in the future as part of readRequest -> Handler is fired, which sleeps for 40ms (60ms left) -> Sends Response -> Server SetsDeadline(100ms in the future) as part of readRequest (for next request, which blocks waiting on i/o from the client) -> client sleeps for 70ms (only ~30ms left until new deadline) -> client Issues Get -> Handler is fired, which sleeps 40ms (boom because only 30ms left).

We could keep extending the deadline in various places, but that is costly from a syscall perspective IIRC. Possibly other issues with that as well, not sure. I defer to @bradfitz.

At the very least this should be documented better. If there is interest in that I can work up a PR that attempts to explain it.

@freeformz
Copy link

PS: We've also run into this issue internally, hence my interest.

@prabrisat1
Copy link
Author

Definitely agree that the deadline can't be extended. @bradfitz referenced this issue in #14204 which I think is a pretty good solution to the issue. Separating the current readTimeout into a readTimeout and an idleTimeout makes the naming much clearer and solves the functional issue as well I think.

@quentinmit
Copy link
Contributor

quentinmit commented Sep 6, 2016

Can this be closed as a dupe of #14204?

Or is this representing the "read timeout is actually response timeout" issue and the other is representing "read timeout is actually idle timeout"?

@quentinmit quentinmit added this to the Go1.8 milestone Sep 6, 2016
@bradfitz
Copy link
Contributor

bradfitz commented Sep 6, 2016

I'd like to keep both open. They're related, but might be closed separately.

@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 7, 2016
@bradfitz
Copy link
Contributor

bradfitz commented Nov 1, 2016

I verified that this was fixed by the changes in #14204, etc. Mostly faf882d and c7e0dda, but a number of other supporting ones.

@bradfitz bradfitz closed this as completed Nov 1, 2016
@golang golang locked and limited conversation to collaborators Nov 1, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants