You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We encountered this goroutine leak in our own project(https://github.com/coreos/etcd) when using the function to check the goroutine that is still running after test finishes:
z_last_test.go:97: Test appears to have leaked a Transport:
net/http.(*persistConn).readLoop(0xc20858c0b0)
/usr/local/go/src/net/http/transport.go:858 +0x4eb
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:660 +0x10d4
testing.RunTests(0x6bb550, 0x838980, 0x1a, 0x1a, 0x11800)
/usr/local/go/src/testing/testing.go:556 +0xdc3
testing.(*M).Run(0xc20802e140, 0x843160)
/usr/local/go/src/testing/testing.go:485 +0xe1
main.main()
github.com/coreos/etcd/integration/_test/_testmain.go:102 +0x28d
The error is under go 1.4.2
The case is writeLoop is closed, but readLoop is still there, and hangs on rc := <-pc.reqch forever.
What we have found
After digging into it, @xiang90 find this, and we think that is the root cause of the leak.
(All codes are excerpted from "pkg/net/http/transport.go" in go1.4)
Bug case
dialConn inits a persistConn, and go readLoop()
roundTrip is called. It adds 1 to pc.numExpectedResponses and sends a requestAndChan to reqch, and the loop in readLoop receives and handles it.
roundTrip has not received the response, and is blocked at WaitResponse block.
readLoop reads the response from pc.br, and no error. It has no body and pushes the response into channel. Then it wait on pc.br.Peek(1) again.
pc.br is closed by remote target, and returns io.EOF.
pc.br.Peek(1) in readLoop returns error io.EOF.
The loop in readLoop finds that pc.numExpectedResponses == 0 is false, then waits on reqCh.
roundTrip receives the response and exits.
No further roundTrip comes, and the persistConn is closed. So no one will push requestAndChan to reqCh in the future, and reqCh will not be closed.
func (pc *persistConn) readLoop() {
alive := true
for alive {
pb, err := pc.br.Peek(1)
pc.lk.Lock()
if pc.numExpectedResponses == 0 {
if !pc.closed {
pc.closeLocked()
if len(pb) > 0 {
log.Printf("Unsolicited response received on idle HTTP channel starting with %q; err=%v",
string(pb), err)
}
}
pc.lk.Unlock()
return
}
pc.lk.Unlock()
rc := <-pc.reqch
var resp *Response
if err == nil {
resp, err = ReadResponse(pc.br, rc.req)
if err == nil && resp.StatusCode == 100 {
// Skip any 100-continue for now.
// TODO(bradfitz): if rc.req had "Expect: 100-continue",
// actually block the request body write and signal the
// writeLoop now to begin sending it. (Issue 2184) For now we
// eat it, since we're never expecting one.
resp, err = ReadResponse(pc.br, rc.req)
}
}
...
}
}
I can get the leak when running etcd integration tests once in five round in race mode.
I have done my best to reproduce it in a simple go program(2-3 full days), but i failed to make it happen because it is really hard to let the code run in the order that is described in bug case.
i hope that you can read over the bug case, and i can help on testing whether this is fixed.
The text was updated successfully, but these errors were encountered:
What we meet
We encountered this goroutine leak in our own project(https://github.com/coreos/etcd) when using the function to check the goroutine that is still running after test finishes:
The error is under go 1.4.2
The case is writeLoop is closed, but readLoop is still there, and hangs on
rc := <-pc.reqch
forever.What we have found
After digging into it, @xiang90 find this, and we think that is the root cause of the leak.
(All codes are excerpted from "pkg/net/http/transport.go" in go1.4)
Bug case
dialConn
inits apersistConn
, andgo readLoop()
roundTrip
is called. It adds 1 topc.numExpectedResponses
and sends arequestAndChan
toreqch
, and the loop inreadLoop
receives and handles it.roundTrip
has not received the response, and is blocked atWaitResponse
block.readLoop
reads the response frompc.br
, and no error. It has no body and pushes the response into channel. Then it wait onpc.br.Peek(1)
again.pc.br
is closed by remote target, and returns io.EOF.pc.br.Peek(1)
inreadLoop
returns error io.EOF.The loop in
readLoop
finds thatpc.numExpectedResponses == 0
is false, then waits onreqCh
.roundTrip
receives the response and exits.No further
roundTrip
comes, and the persistConn isclosed
. So no one will pushrequestAndChan
toreqCh
in the future, andreqCh
will not be closed.The
readLoop
goroutine waits onreqCh
forever.Init
persistConn.readLoop
persistConn.roundTrip
persistConn.close
Can we reproduce it?
I can get the leak when running etcd integration tests once in five round in race mode.
I have done my best to reproduce it in a simple go program(2-3 full days), but i failed to make it happen because it is really hard to let the code run in the order that is described in bug case.
i hope that you can read over the bug case, and i can help on testing whether this is fixed.
The text was updated successfully, but these errors were encountered: