stalled writer goroutines #51

raulk · 2019-05-07T14:26:55Z

Grepping through goroutine traces of our relay infrastructure, I found stalled goroutines:

goroutine 2826624695 [select, 3 minutes]:
github.com/libp2p/go-mplex.(*Multiplex).sendMsg(0xc1f430fc00, 0xf77f00, 0xc0000cc010, 0x18350, 0xc3aefa9990, 0x5, 0x20, 0x0, 0x0)
#011/home/ubuntu/go/pkg/mod/github.com/vyzo/[email protected]/multiplex.go:143 +0xf5
github.com/libp2p/go-mplex.(*Multiplex).NewNamedStream(0xc1f430fc00, 0xc044180008, 0x5, 0xc3aefa9a20, 0xe8ec18, 0xc1720a67a0)
#011/home/ubuntu/go/pkg/mod/github.com/vyzo/[email protected]/multiplex.go:217 +0x1a0
github.com/libp2p/go-mplex.(*Multiplex).NewStream(...)
[...]

When we write control messages, we transfer the deadline from the context to the connection, but the sendMsg method is always called with the background context, which never carries a deadline. So if one write stalls, it'll stall forever and it'll also block any subsequent sending of control messages on that multiplexed connection.

See: https://github.com/libp2p/go-mplex/blob/master/multiplex.go#L152

The text was updated successfully, but these errors were encountered:

vyzo · 2019-05-07T16:27:45Z

We do it on resets and the header too, occur sees 3 occurrences of sending messages with context.Background():

    217:	err := mp.sendMsg(context.Background(), header, []byte(name))
    373:				go mp.sendMsg(context.Background(), ch.header(resetTag), nil)
    385:				go mp.sendMsg(context.Background(), msch.id.header(resetTag), nil)

vyzo · 2019-05-08T10:12:09Z

I think the main culprit is the reset message when we get data in a closed stream:

			if remoteClosed {
				// closed stream, return b
				pool.Put(b)

				log.Warningf("Received data from remote after stream was closed by them. (len = %d)", len(b))
				go mp.sendMsg(context.Background(), msch.id.header(resetTag), nil)
				continue
			}

This occurs so often that we had to downgrade the error log in #46 because it filled the disk with logs in the mplex relay.

vyzo · 2019-05-08T11:10:06Z

The other place where we send timeout-less messages is when we do an actual Close or Reset ourselves.

vyzo mentioned this issue May 8, 2019

use timeouts when sending messages for stream open, close, and reset. #52

Merged

ghost assigned vyzo May 8, 2019

ghost added the status/in-progress In progress label May 8, 2019

vyzo mentioned this issue May 10, 2019

Relay tuning libp2p/go-libp2p-daemon#130

Closed

vyzo closed this as completed in #52 May 10, 2019

ghost removed the status/in-progress In progress label May 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stalled writer goroutines #51

stalled writer goroutines #51

raulk commented May 7, 2019

vyzo commented May 7, 2019 •

edited

Loading

vyzo commented May 8, 2019

vyzo commented May 8, 2019

stalled writer goroutines #51

stalled writer goroutines #51

Comments

raulk commented May 7, 2019

vyzo commented May 7, 2019 • edited Loading

vyzo commented May 8, 2019

vyzo commented May 8, 2019

vyzo commented May 7, 2019 •

edited

Loading