Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: fix stall in Exit() due to tcp producer conns #1198

Merged
merged 1 commit into from
Oct 17, 2019

Conversation

ploxiln
Copy link
Member

@ploxiln ploxiln commented Oct 15, 2019

Consumer connections are closed when topics are closed,
but tcp-protocol publish connections are not,
so add tcpServer.CloseAll().

problem introduced by #1190

cc @jehiah @mreiferson

@ploxiln
Copy link
Member Author

ploxiln commented Oct 15, 2019

cc @benjsto

@ploxiln
Copy link
Member Author

ploxiln commented Oct 15, 2019

some evidence from when I observed this:

[nsqd] 2019/10/15 06:09:07.499522 INFO: NSQ: stopping subsystems
[nsqd] 2019/10/15 06:09:07.499543 INFO: LOOKUP: closing
[nsqd] 2019/10/15 06:09:07.499553 INFO: QUEUESCAN: closing
SIGABRT: abort
PC=0x4604a1 m=0 sigcode=0

goroutine 0 [idle]:
runtime.futex(0xcb45e8, 0x80, 0x0, 0x0, 0x7f3100000000, 0x7fff055afc50, 0x434c03, 0xc0000424c8, 0x7fff055afc70, 0x40ac4f, ...)
	/usr/local/Cellar/go/1.13.1/libexec/src/runtime/sys_linux_amd64.s:535 +0x21
runtime.futexsleep(0xcb45e8, 0x7fff00000000, 0xffffffffffffffff)
	/usr/local/Cellar/go/1.13.1/libexec/src/runtime/os_linux.go:44 +0x46
runtime.notesleep(0xcb45e8)
	/usr/local/Cellar/go/1.13.1/libexec/src/runtime/lock_futex.go:151 +0x9f
runtime.stoplockedm()
	/usr/local/Cellar/go/1.13.1/libexec/src/runtime/proc.go:2068 +0x88
runtime.schedule()
	/usr/local/Cellar/go/1.13.1/libexec/src/runtime/proc.go:2469 +0x485
runtime.park_m(0xc000001500)
	/usr/local/Cellar/go/1.13.1/libexec/src/runtime/proc.go:2610 +0x9d
runtime.mcall(0x0)
	/usr/local/Cellar/go/1.13.1/libexec/src/runtime/asm_amd64.s:318 +0x5b

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0xc000120100)
	/usr/local/Cellar/go/1.13.1/libexec/src/runtime/sema.go:56 +0x42
sync.(*WaitGroup).Wait(0xc0001200f8)
	/usr/local/Cellar/go/1.13.1/libexec/src/sync/waitgroup.go:130 +0x64
github.com/nsqio/nsq/nsqd.(*NSQD).Exit(0xc000120000)
	/Users/pierce/scratch/nsq/nsqd/nsqd.go:448 +0x248
main.(*program).Stop.func1()
	/Users/pierce/scratch/nsq/apps/nsqd/main.go:93 +0x2e
sync.(*Once).doSlow(0xc00000c1a0, 0xc000129e98)
	/usr/local/Cellar/go/1.13.1/libexec/src/sync/once.go:66 +0xe3
sync.(*Once).Do(...)
	/usr/local/Cellar/go/1.13.1/libexec/src/sync/once.go:57
main.(*program).Stop(0xc00000c1a0, 0x0, 0x2)
	/Users/pierce/scratch/nsq/apps/nsqd/main.go:92 +0x65
github.com/judwhite/go-svc/svc.Run(0x9b31a0, 0xc00000c1a0, 0xc00000c1c0, 0x2, 0x2, 0x0, 0x0)
	/Users/pierce/go/pkg/mod/github.com/judwhite/[email protected]/svc/svc_other.go:32 +0x12a
main.main()
	/Users/pierce/scratch/nsq/apps/nsqd/main.go:28 +0xa5

goroutine 18 [syscall]:
os/signal.signal_recv(0x9b0320)
	/usr/local/Cellar/go/1.13.1/libexec/src/runtime/sigqueue.go:147 +0x9c
os/signal.loop()
	/usr/local/Cellar/go/1.13.1/libexec/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
	/usr/local/Cellar/go/1.13.1/libexec/src/os/signal/signal_unix.go:29 +0x41

goroutine 20 [semacquire]:
sync.runtime_Semacquire(0xc000024be8)
	/usr/local/Cellar/go/1.13.1/libexec/src/runtime/sema.go:56 +0x42
sync.(*WaitGroup).Wait(0xc000024be0)
	/usr/local/Cellar/go/1.13.1/libexec/src/sync/waitgroup.go:130 +0x64
github.com/nsqio/nsq/internal/protocol.TCPServer(0x9b31e0, 0xc00000c8c0, 0x9ab520, 0xc00009a030, 0xc000079f90, 0xc000072118, 0x0)
	/Users/pierce/scratch/nsq/internal/protocol/tcp_server.go:45 +0x2fc
github.com/nsqio/nsq/nsqd.(*NSQD).Main.func2()
	/Users/pierce/scratch/nsq/nsqd/nsqd.go:260 +0x82
github.com/nsqio/nsq/internal/util.(*WaitGroupWrapper).Wrap.func1(0xc00008c2a0, 0xc0001200f8)
	/Users/pierce/scratch/nsq/internal/util/wait_group_wrapper.go:14 +0x27
created by github.com/nsqio/nsq/internal/util.(*WaitGroupWrapper).Wrap
	/Users/pierce/scratch/nsq/internal/util/wait_group_wrapper.go:13 +0x62

goroutine 72 [select]:
github.com/nsqio/nsq/nsqd.(*protocolV2).messagePump(0xc00009a200, 0xc0001cd800, 0xc000073f80)
	/Users/pierce/scratch/nsq/nsqd/protocol_v2.go:260 +0x33f
created by github.com/nsqio/nsq/nsqd.(*protocolV2).IOLoop
	/Users/pierce/scratch/nsq/nsqd/protocol_v2.go:51 +0x114

goroutine 55 [IO wait]:
internal/poll.runtime_pollWait(0x7f3184b9a7d8, 0x72, 0xffffffffffffffff)
	/usr/local/Cellar/go/1.13.1/libexec/src/runtime/netpoll.go:184 +0x55
internal/poll.(*pollDesc).wait(0xc000100898, 0x72, 0x4000, 0x4000, 0xffffffffffffffff)
	/usr/local/Cellar/go/1.13.1/libexec/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/Cellar/go/1.13.1/libexec/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc000100880, 0xc000390000, 0x4000, 0x4000, 0x0, 0x0, 0x0)
	/usr/local/Cellar/go/1.13.1/libexec/src/internal/poll/fd_unix.go:169 +0x1cf
net.(*netFD).Read(0xc000100880, 0xc000390000, 0x4000, 0x4000, 0xc000100880, 0x0, 0x0)
	/usr/local/Cellar/go/1.13.1/libexec/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc00000e1a0, 0xc000390000, 0x4000, 0x4000, 0x0, 0x0, 0x0)
	/usr/local/Cellar/go/1.13.1/libexec/src/net/net.go:184 +0x68
bufio.(*Reader).fill(0xc000362c60)
	/usr/local/Cellar/go/1.13.1/libexec/src/bufio/bufio.go:100 +0x103
bufio.(*Reader).ReadSlice(0xc000362c60, 0xbf617694b0f7ad0a, 0x151cefae85, 0xcb3700, 0x0, 0x0, 0xcb3700)
	/usr/local/Cellar/go/1.13.1/libexec/src/bufio/bufio.go:359 +0x3d
github.com/nsqio/nsq/nsqd.(*protocolV2).IOLoop(0xc00009a200, 0x9b8ba0, 0xc00000e1a0, 0x27, 0xc000327f60)
	/Users/pierce/scratch/nsq/nsqd/protocol_v2.go:63 +0x1c0
github.com/nsqio/nsq/nsqd.(*tcpServer).Handle(0xc00009a030, 0x9b8ba0, 0xc00000e1a0)
	/Users/pierce/scratch/nsq/nsqd/tcp.go:44 +0x56b
github.com/nsqio/nsq/internal/protocol.TCPServer.func1(0x9ab520, 0xc00009a030, 0x9b8ba0, 0xc00000e1a0, 0xc000024be0)
	/Users/pierce/scratch/nsq/internal/protocol/tcp_server.go:39 +0x45
created by github.com/nsqio/nsq/internal/protocol.TCPServer
	/Users/pierce/scratch/nsq/internal/protocol/tcp_server.go:38 +0x459

@ploxiln ploxiln changed the title nsqd: fix stall in Exit() due to tcp-protocol producer connections nsqd: fix stall in Exit() due to tcp producer conns Oct 15, 2019
Consumer connections are closed when topics are closed,
but tcp-protocol publish connections are not,
so add tcpServer.CloseAll().

problem introduced by nsqio#1190
@ploxiln ploxiln force-pushed the nsqd_exit_tcp_closeall branch from 1ba3db1 to 5f2153f Compare October 15, 2019 06:47
@@ -154,6 +155,7 @@ func New(opts *Options) (*NSQD, error) {
n.logf(LOG_INFO, version.String("nsqd"))
n.logf(LOG_INFO, "ID: %d", opts.ID)

n.tcpServer = &tcpServer{}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to put this over here to appease the race-detector

@ploxiln ploxiln merged commit c3c5af9 into nsqio:master Oct 17, 2019
@benjsto
Copy link
Contributor

benjsto commented Oct 19, 2019

Doh, thank you for catching and taking care of this! Have been on vacation and missed these notifs.

@ploxiln ploxiln deleted the nsqd_exit_tcp_closeall branch December 16, 2019 17:05
@mreiferson mreiferson added the bug label Jun 14, 2020
suhailpatel added a commit to monzo/nsq that referenced this pull request Dec 16, 2021
suhailpatel added a commit to monzo/nsq that referenced this pull request Dec 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants