Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handleGRPC can panic #13

Closed
simitt opened this issue Dec 6, 2022 · 9 comments · Fixed by #16
Closed

handleGRPC can panic #13

simitt opened this issue Dec 6, 2022 · 9 comments · Fixed by #16
Assignees
Labels
bug Something isn't working

Comments

@simitt
Copy link

simitt commented Dec 6, 2022

(*mux).handleGRPC(..) occasionally panics with following error:

http: panic serving <ip:port>: send on closed channel
goroutine 3386993 [running]:
net/http.(*conn).serve.func1()
	net/http/server.go:1825 +0xbf
panic({0x557013b82be0, 0x557013f70d50})
	runtime/panic.go:844 +0x258
github.com/elastic/gmux.(*mux).handleGRPC(0x0?, 0x60?, {0x557013f99500?, 0xc0021f0cc0?}, 0xc003e135c0, {0x557013db38c0?, 0xc001fe1201?})
	github.com/elastic/[email protected]/mux.go:300 +0x7d
github.com/elastic/gmux.(*mux).handleH2(0x20?, 0xc0000b8800?, {0x557013f98e20?, 0xc0007a0380?}, {0x557013f7a528, 0xc001fe1200})
	github.com/elastic/[email protected]/mux.go:138 +0x519
github.com/elastic/gmux.ConfigureServer.func1(0xc002a3ab60, 0xc0007a0380, {0x557013f7a528?, 0xc001fe1200?})
	github.com/elastic/[email protected]/mux.go:79 +0x51
net/http.(*conn).serve(0xc0027acb40, {0x557013f92370, 0xc004642180})
	net/http/server.go:1874 +0x1293
created by net/http.(*Server).Serve
	net/http/server.go:3071 +0x4db

I have not yet found how to reproduce it.

@simitt simitt added the bug Something isn't working label Dec 6, 2022
@carsonip carsonip self-assigned this Jan 9, 2023
@carsonip
Copy link
Member

carsonip commented Jan 10, 2023

TLDR: In the rare occasion during shutdown where a new gprc connection comes in after grpcServer shutdown and before httpServer shutdown, the user program panics.

Managed to reliably reproduce the issue.

2023/01/10 11:03:43 http: panic serving 127.0.0.1:35366: send on closed channel
goroutine 28987 [running]:
net/http.(*conn).serve.func1()
        /home/carson/sdk/go1.19.3/src/net/http/server.go:1850 +0xbf
panic({0x8e3d20, 0xa3beb0})
        /home/carson/sdk/go1.19.3/src/runtime/panic.go:890 +0x262
github.com/elastic/gmux.(*mux).handleGRPC(...)
        /home/carson/projects/gmux/mux.go:300
github.com/elastic/gmux.(*mux).withGRPCInsecure.func1({0xa409a0, 0xc00166a380}, 0xc001b28300)
        /home/carson/projects/gmux/mux.go:110 +0x568
net/http.HandlerFunc.ServeHTTP(0x0?, {0xa409a0?, 0xc00166a380?}, 0x46518e?)
        /home/carson/sdk/go1.19.3/src/net/http/server.go:2109 +0x2f
net/http.serverHandler.ServeHTTP({0xc00265a7b0?}, {0xa409a0, 0xc00166a380}, 0xc001b28300)
        /home/carson/sdk/go1.19.3/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc0015b01e0, {0xa410e0, 0xc0001184e0})
        /home/carson/sdk/go1.19.3/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
        /home/carson/sdk/go1.19.3/src/net/http/server.go:3102 +0x4db

This only happens in the rare occasion where user program is shutting down and a new grpc connection comes in between grpcServer shutdown and httpServer shutdown. This is caused by httpServer and grpcServer closing order in code using gmux. There is no bug inside gmux mux.go implementation.

The program panics when the servers are closed in this order in user code:

grpcServer.GracefulStop()
httpServer.Shutdown(context.Background())

This is because grpcServer.GracefulStop() closes the grpc listener channel mux.grpcConns while httpServer is still happily accepting grpc connections. When connections are accepted and tried to be placed into a closed channel mux.grpcConns, the program panics.

The solution is to reverse the closing order in user code, i.e.

httpServer.Shutdown(context.Background())
grpcServer.GracefulStop()

such that the gmux configured httpServer stops accepting connections first.

@axw
Copy link
Member

axw commented Jan 11, 2023

This is because grpcServer.GracefulStop() closes the grpc listener channel mux.grpcConns while httpServer is still happily accepting grpc connections. When connections are accepted and tried to be placed into a closed channel mux.grpcConns, the program panics.

@carsonip How is mux.grpcConns being closed? I don't see anything closing it in mux.go.

@carsonip
Copy link
Member

This is because grpcServer.GracefulStop() closes the grpc listener channel mux.grpcConns while httpServer is still happily accepting grpc connections. When connections are accepted and tried to be placed into a closed channel mux.grpcConns, the program panics.

@carsonip How is mux.grpcConns being closed? I don't see anything closing it in mux.go.

@axw It is not closed in mux.go. In the case of apm-server, it is closed in https://github.com/elastic/apm-server/blob/main/internal/beater/server.go#L222

@axw
Copy link
Member

axw commented Jan 11, 2023

@carsonip I don't mean closing the server, but the channel. You said:

This is because grpcServer.GracefulStop() closes the grpc listener channel mux.grpcConns

How does closing the server close the channel?

@carsonip
Copy link
Member

How does closing the server close the channel?

@axw Good question. It isn't immediately obvious.

In apm-server, on startup, we start the grpc server with

s.grpcServer.Serve(s.httpServer.grpcListener)

which calls grpc package server.go

func (s *Server) Serve(lis net.Listener) error {
	...
	ls := &listenSocket{Listener: lis}
	s.lis[ls] = true
	...
}

In apm-server, on shutdown,

s.grpcServer.GracefulStop()

which calls grpc package server.go

func (s *Server) GracefulStop() {
	...
	for lis := range s.lis {
		lis.Close()
	}
	...
}

which in turn calls chanListener.Close because lis is a gmux/conn.go chanListener which implements net.Listener,

func (l *chanListener) Close() error {
	l.closeOnce.Do(func() {
		close(l.conns)
	})
	return nil
}

@axw
Copy link
Member

axw commented Jan 11, 2023

Thanks @carsonip! Makes sense now.

I think it would still be good to fix in gmux, as it feels like a bit of a footgun. Not urgent, but can we leave this open?

@carsonip
Copy link
Member

Thanks @carsonip! Makes sense now.

I think it would still be good to fix in gmux, as it feels like a bit of a footgun. Not urgent, but can we leave this open?

Yes, let's leave this open. Would like to know how we can fix this properly, otherwise this will be a point to document in readme.

People usually say "only writers should close the channel" but in this case gmux has no idea whether the channel is closed / should be closed due to reasons including but not limited to shutdown. We may have to add something to signal shutdown to gmux so that it stops sending stuff into the channel.

@axw
Copy link
Member

axw commented Jan 12, 2023

Would like to know how we can fix this properly, otherwise this will be a point to document in readme.

Could we just never close the chanListener.conns channel? Instead we could have a chanListener.closed channel which is closed by chanListener.Close, and is only ever received from. chanListener.Accept would select on both <-conns and <-closed. handleGRPC may need to do the same, not sure now.

@carsonip
Copy link
Member

I have a draft of a fix in gmux at #16 although it may be a little ugly. CI isn't happy and tests are timing out for the PR and even on main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants